Spaces:

BinKhoaLe1812
/

EdSummariser

Sleeping

App Files Files Community

LiamKhoaLe commited on Sep 22

Commit

f3a5a1f

1 Parent(s): ef1ba2b

Upd Deepseek agent task assignment

Browse files

Files changed (11) hide show

AGENT_ASNM.md +143 -0
README.md +4 -1
memo/consolidation.py +3 -8
memo/plan/__pycache__/execution.cpython-311.pyc +0 -0
memo/plan/__pycache__/intent.cpython-311.pyc +0 -0
memo/plan/__pycache__/strategy.cpython-311.pyc +0 -0
memo/plan/execution.py +3 -8
memo/plan/intent.py +3 -8
memo/retrieval.py +9 -24
routes/chats.py +3 -3
utils/api/router.py +38 -11

AGENT_ASNM.md ADDED Viewed

	@@ -0,0 +1,143 @@

+# Task Assignment Review - Corrected Model Hierarchy
+## Overview
+This document summarizes the corrected task assignments to ensure proper model hierarchy:
+- **Easy tasks** (immediate execution, simple) → **Llama** (NVIDIA small)
+- **Medium tasks** (accurate, reasoning, not too time-consuming) → **DeepSeek**
+- **Hard tasks** (complex analysis, synthesis, long-form) → **Gemini Pro**
+## Corrected Task Assignments
+### ✅ **Easy Tasks - Llama (NVIDIA Small)**
+**Purpose**: Immediate execution, simple operations
+**Current Assignments**:
+- `llama_chat()` - Basic chat completion
+- `llama_summarize()` - Simple text summarization
+- `summarize_qa()` - Basic Q&A summarization
+- `naive_fallback()` - Simple text processing fallback
+### ✅ **Medium Tasks - DeepSeek**
+**Purpose**: Accurate reasoning, not too time-consuming
+**Corrected Assignments**:
+#### **Search Operations** (`routes/search.py`)
+- `extract_search_keywords()` - Keyword extraction with reasoning
+- `generate_search_strategies()` - Search strategy generation
+- `extract_relevant_content()` - Content relevance filtering
+- `assess_content_quality()` - Quality assessment with reasoning
+- `cross_validate_information()` - Fact-checking and validation
+- `generate_content_summary()` - Content summarization
+#### **Memory Operations** (`memo/`)
+- `files_relevance()` - File relevance classification
+- `related_recent_context()` - Context selection with reasoning
+- `_ai_intent_detection()` - User intent detection (CORRECTED)
+- `_ai_select_qa_memories()` - Memory selection with reasoning (CORRECTED)
+- `_should_enhance_with_context()` - Context enhancement decision (CORRECTED)
+- `_enhance_question_with_context()` - Question enhancement (CORRECTED)
+- `_enhance_instructions_with_context()` - Instruction enhancement (CORRECTED)
+- `consolidate_similar_memories()` - Memory consolidation (CORRECTED)
+#### **Content Processing** (`utils/service/summarizer.py`)
+- `clean_chunk_text()` - Content cleaning with reasoning
+- `deepseek_summarize()` - Medium complexity summarization
+#### **Chat Operations** (`routes/chats.py`)
+- `generate_query_variations()` - Query variation generation (CORRECTED)
+### ✅ **Hard Tasks - Gemini Pro**
+**Purpose**: Complex analysis, synthesis, long-form content
+**Current Assignments**:
+- `generate_cot_plan()` - Chain of Thought report planning
+- `analyze_subtask_comprehensive()` - Comprehensive analysis
+- `synthesize_section_analysis()` - Complex synthesis
+- `generate_final_report()` - Long-form report generation
+- All complex report generation tasks
+## Key Corrections Made
+### 1. **Intent Detection** (`memo/plan/intent.py`)
+- **Before**: Used Llama for simple classification
+- **After**: Uses DeepSeek for better reasoning about user intent
+- **Reason**: Requires understanding context and nuance
+### 2. **Memory Selection** (`memo/plan/execution.py`)
+- **Before**: Used Llama for memory selection
+- **After**: Uses DeepSeek for better reasoning about relevance
+- **Reason**: Requires understanding context relationships
+### 3. **Context Enhancement** (`memo/retrieval.py`)
+- **Before**: Used Llama for enhancement decisions
+- **After**: Uses DeepSeek for better reasoning about context value
+- **Reason**: Requires understanding question-context relationships
+### 4. **Question Enhancement** (`memo/retrieval.py`)
+- **Before**: Used Llama for question enhancement
+- **After**: Uses DeepSeek for better reasoning about enhancement
+- **Reason**: Requires understanding conversation flow and context
+### 5. **Memory Consolidation** (`memo/consolidation.py`)
+- **Before**: Used Llama for memory consolidation
+- **After**: Uses DeepSeek for better reasoning about similarity
+- **Reason**: Requires understanding content relationships
+### 6. **Query Variation Generation** (`routes/chats.py`)
+- **Before**: Used Llama for query variations
+- **After**: Uses DeepSeek for better reasoning about variations
+- **Reason**: Requires understanding question intent and context
+## Enhanced Model Selection Logic
+### **Complexity Heuristics**
+```python
+# Hard tasks (Gemini Pro)
+- Keywords: "prove", "derivation", "complexity", "algorithm", "optimize", "theorem", "rigorous", "step-by-step", "policy critique", "ambiguity", "counterfactual", "comprehensive", "detailed analysis", "synthesis", "evaluation"
+- Length: > 100 words or > 3000 context words
+- Content: "comprehensive" or "detailed" in question
+# Medium tasks (DeepSeek)
+- Keywords: "analyze", "explain", "compare", "evaluate", "summarize", "extract", "classify", "identify", "describe", "discuss", "reasoning", "context", "enhance", "select", "consolidate"
+- Length: 10-100 words or 200-3000 context words
+- Content: "reasoning" or "context" in question
+# Simple tasks (Llama)
+- Keywords: "what", "how", "when", "where", "who", "yes", "no", "count", "list", "find"
+- Length: ≤ 10 words or ≤ 200 context words
+```
+## Benefits of Corrected Assignments
+### **Performance Improvements**
+- **Better reasoning** for medium complexity tasks with DeepSeek
+- **Faster execution** for simple tasks with Llama
+- **Higher quality** for complex tasks with Gemini Pro
+### **Cost Optimization**
+- **Reduced Gemini usage** for tasks that don't need its full capabilities
+- **Better task distribution** across model capabilities
+- **Maintained efficiency** for simple tasks
+### **Quality Improvements**
+- **Better intent detection** with DeepSeek's reasoning
+- **Improved memory operations** with better context understanding
+- **Enhanced search operations** with better relevance filtering
+- **More accurate content processing** with reasoning capabilities
+## Verification Checklist
+- ✅ All easy tasks use Llama (NVIDIA small)
+- ✅ All medium tasks use DeepSeek
+- ✅ All hard tasks use Gemini Pro
+- ✅ Model selection logic properly categorizes tasks
+- ✅ No linting errors in modified files
+- ✅ All functions have proper fallback mechanisms
+- ✅ Error handling is maintained for all changes
+## Configuration
+The system is ready to use with the environment variable:
+```bash
+NVIDIA_MEDIUM=deepseek-ai/deepseek-v3.1
+```
+All changes maintain backward compatibility and include proper error handling.

README.md CHANGED Viewed

@@ -82,7 +82,7 @@ Open: `http://localhost:8000/static/`  •  Health: `GET /healthz`
 - PDF export renders code blocks with a dark IDE-like theme and lightweight syntax highlighting; control characters are stripped to avoid square artifacts.
 - CORS is open for the demo UI; restrict for production.
-### Samples
 [Report Generation](https://huggingface.co/spaces/BinKhoaLe1812/EdSummariser/blob/main/report.pdf)
@@ -90,6 +90,9 @@ Open: `http://localhost:8000/static/`  •  Health: `GET /healthz`
 [Utils Dir](https://huggingface.co/spaces/BinKhoaLe1812/EdSummariser/blob/main/utils/README.md)
 ### License

 - PDF export renders code blocks with a dark IDE-like theme and lightweight syntax highlighting; control characters are stripped to avoid square artifacts.
 - CORS is open for the demo UI; restrict for production.
+### Docs
 [Report Generation](https://huggingface.co/spaces/BinKhoaLe1812/EdSummariser/blob/main/report.pdf)
 [Utils Dir](https://huggingface.co/spaces/BinKhoaLe1812/EdSummariser/blob/main/utils/README.md)
+[Routes Dir](https://huggingface.co/spaces/BinKhoaLe1812/EdSummariser/blob/main/routes/README.md)
+[Agent Assignment](https://huggingface.co/spaces/BinKhoaLe1812/EdSummariser/blob/main/AGENT_ASNM.md)
 ### License

memo/consolidation.py CHANGED Viewed

@@ -179,14 +179,9 @@ Return the consolidated content in the same format as the original memories."""
 Create a single consolidated memory:"""
-                    selection = {"provider": "nvidia", "model": "meta/llama-3.1-8b-instruct"}
-                    consolidated_content = await generate_answer_with_model(
-                        selection=selection,
-                        system_prompt=sys_prompt,
-                        user_prompt=user_prompt,
-                        gemini_rotator=None,
-                        nvidia_rotator=nvidia_rotator
-                    )
                     return {
                         "content": consolidated_content.strip(),

 Create a single consolidated memory:"""
+                    # Use DeepSeek for better memory consolidation reasoning
+                    from utils.api.router import deepseek_chat_completion
+                    consolidated_content = await deepseek_chat_completion(sys_prompt, user_prompt, nvidia_rotator)
                     return {
                         "content": consolidated_content.strip(),

memo/plan/__pycache__/execution.cpython-311.pyc DELETED Viewed

Binary file (20.8 kB)

memo/plan/__pycache__/intent.cpython-311.pyc DELETED Viewed

Binary file (7.57 kB)

memo/plan/__pycache__/strategy.cpython-311.pyc DELETED Viewed

Binary file (6.17 kB)

memo/plan/execution.py CHANGED Viewed

@@ -349,14 +349,9 @@ Available Q&A Memories:
 Select the most relevant Q&A memories:"""
-            selection = {"provider": "nvidia", "model": "meta/llama-3.1-8b-instruct"}
-            response = await generate_answer_with_model(
-                selection=selection,
-                system_prompt=sys_prompt,
-                user_prompt=user_prompt,
-                gemini_rotator=None,
-                nvidia_rotator=nvidia_rotator
-            )
             return response.strip()

 Select the most relevant Q&A memories:"""
+            # Use DeepSeek for better memory selection reasoning
+            from utils.api.router import deepseek_chat_completion
+            response = await deepseek_chat_completion(sys_prompt, user_prompt, nvidia_rotator)
             return response.strip()

memo/plan/intent.py CHANGED Viewed

@@ -110,14 +110,9 @@ Respond with only the intent name (e.g., "ENHANCEMENT")."""
             user_prompt = f"Question: {question}\n\nWhat is the user's intent?"
-            selection = {"provider": "nvidia", "model": "meta/llama-3.1-8b-instruct"}
-            response = await generate_answer_with_model(
-                selection=selection,
-                system_prompt=sys_prompt,
-                user_prompt=user_prompt,
-                gemini_rotator=None,
-                nvidia_rotator=nvidia_rotator
-            )
             # Parse response
             response_upper = response.strip().upper()

             user_prompt = f"Question: {question}\n\nWhat is the user's intent?"
+            # Use DeepSeek for better intent detection reasoning
+            from utils.api.router import deepseek_chat_completion
+            response = await deepseek_chat_completion(sys_prompt, user_prompt, nvidia_rotator)
             # Parse response
             response_upper = response.strip().upper()

memo/retrieval.py CHANGED Viewed

@@ -221,14 +221,9 @@ Semantic: {semantic_context[:200]}...
 Should this question be enhanced with context?"""
-                    selection = {"provider": "nvidia", "model": "meta/llama-3.1-8b-instruct"}
-                    response = await generate_answer_with_model(
-                        selection=selection,
-                        system_prompt=sys_prompt,
-                        user_prompt=user_prompt,
-                        gemini_rotator=None,
-                        nvidia_rotator=nvidia_rotator
-                    )
                     return "YES" in response.upper()
@@ -272,14 +267,9 @@ RELEVANT CONTEXT:
 Create an enhanced version that incorporates this context naturally."""
-            selection = {"provider": "nvidia", "model": "meta/llama-3.1-8b-instruct"}
-            enhanced_question = await generate_answer_with_model(
-                selection=selection,
-                system_prompt=sys_prompt,
-                user_prompt=user_prompt,
-                gemini_rotator=None,
-                nvidia_rotator=nvidia_rotator
-            )
             return enhanced_question.strip(), True
@@ -316,14 +306,9 @@ RELEVANT CONTEXT:
 Create an enhanced version that incorporates this context naturally."""
-            selection = {"provider": "nvidia", "model": "meta/llama-3.1-8b-instruct"}
-            enhanced_instructions = await generate_answer_with_model(
-                selection=selection,
-                system_prompt=sys_prompt,
-                user_prompt=user_prompt,
-                gemini_rotator=None,
-                nvidia_rotator=nvidia_rotator
-            )
             return enhanced_instructions.strip(), True

 Should this question be enhanced with context?"""
+                    # Use DeepSeek for better context enhancement reasoning
+                    from utils.api.router import deepseek_chat_completion
+                    response = await deepseek_chat_completion(sys_prompt, user_prompt, nvidia_rotator)
                     return "YES" in response.upper()
 Create an enhanced version that incorporates this context naturally."""
+            # Use DeepSeek for better question enhancement reasoning
+            from utils.api.router import deepseek_chat_completion
+            enhanced_question = await deepseek_chat_completion(sys_prompt, user_prompt, nvidia_rotator)
             return enhanced_question.strip(), True
 Create an enhanced version that incorporates this context naturally."""
+            # Use DeepSeek for better instruction enhancement reasoning
+            from utils.api.router import deepseek_chat_completion
+            enhanced_instructions = await deepseek_chat_completion(sys_prompt, user_prompt, nvidia_rotator)
             return enhanced_instructions.strip(), True

routes/chats.py CHANGED Viewed

@@ -201,9 +201,9 @@ Return only the variations, one per line, no numbering or extra text."""
         user_prompt = f"Original question: {question}\n\nGenerate query variations:"
-        from utils.api.router import generate_answer_with_model
-        selection = {"provider": "nvidia", "model": "meta/llama-3.1-8b-instruct"}
-        response = await generate_answer_with_model(selection, sys_prompt, user_prompt, None, nvidia_rotator)
         # Parse variations
         variations = [line.strip() for line in response.split('\n') if line.strip()]

         user_prompt = f"Original question: {question}\n\nGenerate query variations:"
+        # Use DeepSeek for better query variation generation reasoning
+        from utils.api.router import deepseek_chat_completion
+        response = await deepseek_chat_completion(sys_prompt, user_prompt, nvidia_rotator)
         # Parse variations
         variations = [line.strip() for line in response.split('\n') if line.strip()]

utils/api/router.py CHANGED Viewed

@@ -17,27 +17,54 @@ NVIDIA_MEDIUM = os.getenv("NVIDIA_MEDIUM", "deepseek-ai/deepseek-v3.1")  # DeepS
 def select_model(question: str, context: str) -> Dict[str, Any]:
     """
-    Enhanced complexity heuristic with DeepSeek integration:
-    - If very complex (hard keywords, long context) -> Gemini Pro
-    - If medium complexity (moderate length, some reasoning) -> DeepSeek
-    - If simple (short, basic) -> NVIDIA small
     """
     qlen = len(question.split())
     clen = len(context.split())
-    hard_keywords = ("prove", "derivation", "complexity", "algorithm", "optimize", "theorem", "rigorous", "step-by-step", "policy critique", "ambiguity", "counterfactual")
-    medium_keywords = ("analyze", "explain", "compare", "evaluate", "summarize", "extract", "classify", "identify", "describe", "discuss")
-    is_very_hard = any(k in question.lower() for k in hard_keywords) or qlen > 80 or clen > 2000
-    is_medium = any(k in question.lower() for k in medium_keywords) or qlen > 15 or clen > 500
     if is_very_hard:
-        # Use Gemini Pro for very complex tasks
         return {"provider": "gemini", "model": GEMINI_PRO}
     elif is_medium:
-        # Use DeepSeek for medium complexity tasks
         return {"provider": "deepseek", "model": NVIDIA_MEDIUM}
     else:
-        # Use NVIDIA small for simple tasks
         return {"provider": "nvidia", "model": NVIDIA_SMALL}

 def select_model(question: str, context: str) -> Dict[str, Any]:
     """
+    Enhanced complexity heuristic with proper model hierarchy:
+    - Easy tasks (immediate execution, simple) -> Llama (NVIDIA small)
+    - Medium tasks (accurate, reasoning, not too time-consuming) -> DeepSeek
+    - Hard tasks (complex analysis, synthesis, long-form) -> Gemini Pro
     """
     qlen = len(question.split())
     clen = len(context.split())
+    # Hard task keywords - require complex reasoning and analysis
+    hard_keywords = ("prove", "derivation", "complexity", "algorithm", "optimize", "theorem", "rigorous", "step-by-step", "policy critique", "ambiguity", "counterfactual", "comprehensive", "detailed analysis", "synthesis", "evaluation")
+    # Medium task keywords - require reasoning but not too complex
+    medium_keywords = ("analyze", "explain", "compare", "evaluate", "summarize", "extract", "classify", "identify", "describe", "discuss", "reasoning", "context", "enhance", "select", "consolidate")
+    # Simple task keywords - immediate execution
+    simple_keywords = ("what", "how", "when", "where", "who", "yes", "no", "count", "list", "find")
+    # Determine complexity level
+    is_very_hard = (
+        any(k in question.lower() for k in hard_keywords) or
+        qlen > 100 or
+        clen > 3000 or
+        "comprehensive" in question.lower() or
+        "detailed" in question.lower()
+    )
+    is_medium = (
+        any(k in question.lower() for k in medium_keywords) or
+        (qlen > 10 and qlen <= 100) or
+        (clen > 200 and clen <= 3000) or
+        "reasoning" in question.lower() or
+        "context" in question.lower()
+    )
+    is_simple = (
+        any(k in question.lower() for k in simple_keywords) or
+        qlen <= 10 or
+        clen <= 200
+    )
     if is_very_hard:
+        # Use Gemini Pro for very complex tasks requiring advanced reasoning
         return {"provider": "gemini", "model": GEMINI_PRO}
     elif is_medium:
+        # Use DeepSeek for medium complexity tasks requiring reasoning but not too time-consuming
         return {"provider": "deepseek", "model": NVIDIA_MEDIUM}
     else:
+        # Use NVIDIA small (Llama) for simple tasks requiring immediate execution
         return {"provider": "nvidia", "model": NVIDIA_SMALL}