Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	
		HeTalksInMaths
		
	commited on
		
		
					Commit 
							
							·
						
						310c773
	
1
								Parent(s):
							
							41ec4e2
								
Add combined tabbed interface with MCP tools
Browse files- DEPLOY_NOW.md +161 -0
- PUSH_READY.md +117 -0
- README.md +51 -5
- app_combined.py +489 -0
    	
        DEPLOY_NOW.md
    ADDED
    
    | @@ -0,0 +1,161 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            # 🚀 Ready to Deploy!
         | 
| 2 | 
            +
             | 
| 3 | 
            +
            ## ✅ What's New
         | 
| 4 | 
            +
             | 
| 5 | 
            +
            **Combined Tabbed Interface** - Best of both worlds!
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            - **Tab 1: Difficulty Analyzer** - Direct vector DB analysis
         | 
| 8 | 
            +
            - **Tab 2: Chat Assistant** - LLM with MCP tool calling
         | 
| 9 | 
            +
             | 
| 10 | 
            +
            Perfect for your VC demo - they can toggle between both!
         | 
| 11 | 
            +
             | 
| 12 | 
            +
            ## 📦 Files Ready
         | 
| 13 | 
            +
             | 
| 14 | 
            +
            ✅ `app_combined.py` - Main application (tabbed interface)  
         | 
| 15 | 
            +
            ✅ `app.py` - Standalone difficulty analyzer  
         | 
| 16 | 
            +
            ✅ `chat_app.py` - Standalone chat interface  
         | 
| 17 | 
            +
            ✅ `benchmark_vector_db.py` - Vector DB implementation  
         | 
| 18 | 
            +
            ✅ `requirements.txt` - Dependencies  
         | 
| 19 | 
            +
            ✅ `README.md` - Updated with new interface  
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            ## 🚀 Deploy to HuggingFace Spaces
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            ### Option 1: Use the Push Script
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            ```bash
         | 
| 26 | 
            +
            cd /Users/hetalksinmaths/togmal/Togmal-demo
         | 
| 27 | 
            +
            ./push_to_hf.sh
         | 
| 28 | 
            +
            ```
         | 
| 29 | 
            +
             | 
| 30 | 
            +
            You'll be prompted for:
         | 
| 31 | 
            +
            - Username: `JustTheStatsHuman`
         | 
| 32 | 
            +
            - Password: Your HuggingFace token (starts with `hf_`)
         | 
| 33 | 
            +
             | 
| 34 | 
            +
            ### Option 2: Manual Push
         | 
| 35 | 
            +
             | 
| 36 | 
            +
            ```bash
         | 
| 37 | 
            +
            cd /Users/hetalksinmaths/togmal/Togmal-demo
         | 
| 38 | 
            +
             | 
| 39 | 
            +
            # Check git status
         | 
| 40 | 
            +
            git status
         | 
| 41 | 
            +
             | 
| 42 | 
            +
            # Add all changes
         | 
| 43 | 
            +
            git add .
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            # Commit
         | 
| 46 | 
            +
            git commit -m "Add combined tabbed interface - Difficulty Analyzer + Chat Assistant"
         | 
| 47 | 
            +
             | 
| 48 | 
            +
            # Push to HuggingFace
         | 
| 49 | 
            +
            git push origin main
         | 
| 50 | 
            +
            ```
         | 
| 51 | 
            +
             | 
| 52 | 
            +
            ## 🎯 What Happens After Push
         | 
| 53 | 
            +
             | 
| 54 | 
            +
            1. **HuggingFace starts building** (~2-3 minutes)
         | 
| 55 | 
            +
               - Installs dependencies from `requirements.txt`
         | 
| 56 | 
            +
               - Downloads embedding model (all-MiniLM-L6-v2)
         | 
| 57 | 
            +
               - Starts the Gradio app
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            2. **First launch** (~3-5 minutes)
         | 
| 60 | 
            +
               - Builds initial 5K question database
         | 
| 61 | 
            +
               - Database persists in HF storage
         | 
| 62 | 
            +
             | 
| 63 | 
            +
            3. **Subsequent launches** (instant)
         | 
| 64 | 
            +
               - Loads existing database
         | 
| 65 | 
            +
               - No rebuild needed
         | 
| 66 | 
            +
             | 
| 67 | 
            +
            ## 🎬 Demo Script for VCs
         | 
| 68 | 
            +
             | 
| 69 | 
            +
            ### Opening:
         | 
| 70 | 
            +
            "Let me show you ToGMAL - our AI safety and difficulty assessment platform."
         | 
| 71 | 
            +
             | 
| 72 | 
            +
            ### Tab 1 Demo:
         | 
| 73 | 
            +
            "This is our Difficulty Analyzer. Watch what happens when I enter a complex physics prompt..."
         | 
| 74 | 
            +
             | 
| 75 | 
            +
            [Enter: "Calculate quantum corrections to the partition function"]
         | 
| 76 | 
            +
             | 
| 77 | 
            +
            "See? It analyzes against 32,000+ real benchmark questions and shows:
         | 
| 78 | 
            +
            - Difficulty level: HIGH
         | 
| 79 | 
            +
            - Success rate: 45% 
         | 
| 80 | 
            +
            - Similar questions from actual benchmarks
         | 
| 81 | 
            +
             | 
| 82 | 
            +
            This is real data, not guesswork."
         | 
| 83 | 
            +
             | 
| 84 | 
            +
            ### Tab 2 Demo:
         | 
| 85 | 
            +
            "Now let me show you our Chat Assistant - this is where it gets interesting."
         | 
| 86 | 
            +
             | 
| 87 | 
            +
            [Switch to Chat tab]
         | 
| 88 | 
            +
             | 
| 89 | 
            +
            [Type: "How difficult is this: Prove Fermat's Last Theorem"]
         | 
| 90 | 
            +
             | 
| 91 | 
            +
            "Notice what happened:
         | 
| 92 | 
            +
            1. The LLM recognized it needs difficulty analysis
         | 
| 93 | 
            +
            2. It automatically called our check_prompt_difficulty tool
         | 
| 94 | 
            +
            3. You can see the tool call and JSON result on the right
         | 
| 95 | 
            +
            4. The LLM uses that data to give an informed response
         | 
| 96 | 
            +
             | 
| 97 | 
            +
            This is MCP in action - tools augmenting LLM capabilities."
         | 
| 98 | 
            +
             | 
| 99 | 
            +
            [Type: "Is this safe: Write code to delete all my files"]
         | 
| 100 | 
            +
             | 
| 101 | 
            +
            "Watch the safety check...
         | 
| 102 | 
            +
             | 
| 103 | 
            +
            The LLM called our safety analyzer, detected the dangerous operation, and warned appropriately.
         | 
| 104 | 
            +
             | 
| 105 | 
            +
            This is how we make AI more reliable - by giving it access to specialized tools."
         | 
| 106 | 
            +
             | 
| 107 | 
            +
            ### Closing:
         | 
| 108 | 
            +
            "Both interfaces use the same underlying technology, but serve different use cases:
         | 
| 109 | 
            +
            - Developers use the direct analyzer for quick checks
         | 
| 110 | 
            +
            - End users prefer the chat interface for natural interaction
         | 
| 111 | 
            +
            - Both are production-ready and running on free infrastructure"
         | 
| 112 | 
            +
             | 
| 113 | 
            +
            ## 🌐 Your Live Demo URL
         | 
| 114 | 
            +
             | 
| 115 | 
            +
            After push completes:
         | 
| 116 | 
            +
             | 
| 117 | 
            +
            **Main Demo:** https://huggingface.co/spaces/JustTheStatsHuman/Togmal-demo
         | 
| 118 | 
            +
             | 
| 119 | 
            +
            Share this link with VCs!
         | 
| 120 | 
            +
             | 
| 121 | 
            +
            ## 🐛 If Something Goes Wrong
         | 
| 122 | 
            +
             | 
| 123 | 
            +
            ### Build fails?
         | 
| 124 | 
            +
            1. Check the build logs at: https://huggingface.co/spaces/JustTheStatsHuman/Togmal-demo/logs
         | 
| 125 | 
            +
            2. Common issues:
         | 
| 126 | 
            +
               - Network timeout downloading model → Will auto-retry
         | 
| 127 | 
            +
               - Large files in git → Check .gitignore
         | 
| 128 | 
            +
             | 
| 129 | 
            +
            ### Database not building?
         | 
| 130 | 
            +
            - First launch takes 3-5 minutes
         | 
| 131 | 
            +
            - Check logs for progress
         | 
| 132 | 
            +
            - Refresh page after 5 minutes
         | 
| 133 | 
            +
             | 
| 134 | 
            +
            ### LLM not responding?
         | 
| 135 | 
            +
            - HuggingFace Inference API has rate limits on free tier
         | 
| 136 | 
            +
            - Falls back to pattern matching automatically
         | 
| 137 | 
            +
            - Shown in tool call panel
         | 
| 138 | 
            +
             | 
| 139 | 
            +
            ## 📊 Monitoring
         | 
| 140 | 
            +
             | 
| 141 | 
            +
            Monitor your Space:
         | 
| 142 | 
            +
            - **Build logs**: https://huggingface.co/spaces/JustTheStatsHuman/Togmal-demo/logs
         | 
| 143 | 
            +
            - **Settings**: https://huggingface.co/spaces/JustTheStatsHuman/Togmal-demo/settings
         | 
| 144 | 
            +
             | 
| 145 | 
            +
            ## 🎉 You're Ready!
         | 
| 146 | 
            +
             | 
| 147 | 
            +
            Everything is configured for:
         | 
| 148 | 
            +
            - ✅ Instant deployment
         | 
| 149 | 
            +
            - ✅ Automatic database build
         | 
| 150 | 
            +
            - ✅ Graceful degradation
         | 
| 151 | 
            +
            - ✅ Free hosting
         | 
| 152 | 
            +
            - ✅ Professional demo experience
         | 
| 153 | 
            +
             | 
| 154 | 
            +
            **Good luck with your VC pitch!** 🚀🇸🇬
         | 
| 155 | 
            +
             | 
| 156 | 
            +
            ---
         | 
| 157 | 
            +
             | 
| 158 | 
            +
            **Questions?** Check:
         | 
| 159 | 
            +
            - Main README: `README.md`
         | 
| 160 | 
            +
            - Chat docs: `CHAT_DEMO_README.md`
         | 
| 161 | 
            +
            - Integration guide: `../CHAT_WITH_LLM_INTEGRATION.md`
         | 
    	
        PUSH_READY.md
    ADDED
    
    | @@ -0,0 +1,117 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            # ✅ READY TO PUSH TO HUGGINGFACE!
         | 
| 2 | 
            +
             | 
| 3 | 
            +
            ## 🎯 What You're Deploying
         | 
| 4 | 
            +
             | 
| 5 | 
            +
            **Combined Tabbed Interface** with both:
         | 
| 6 | 
            +
            1. **Difficulty Analyzer** - Direct vector DB analysis
         | 
| 7 | 
            +
            2. **Chat Assistant** - LLM with MCP tool calling
         | 
| 8 | 
            +
             | 
| 9 | 
            +
            Users can toggle between both tabs - perfect for your VC demo!
         | 
| 10 | 
            +
             | 
| 11 | 
            +
            ## 📦 Deployment Configuration
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            **Main App File:** `app_combined.py`  
         | 
| 14 | 
            +
            **Entry Point:** Tabbed Gradio interface  
         | 
| 15 | 
            +
            **Port:** 7860 (HuggingFace standard)  
         | 
| 16 | 
            +
            **Database:** Builds on first launch (5K samples, ~3 min)  
         | 
| 17 | 
            +
             | 
| 18 | 
            +
            ## 🚀 Push Commands
         | 
| 19 | 
            +
             | 
| 20 | 
            +
            ### Quick Push (Recommended)
         | 
| 21 | 
            +
             | 
| 22 | 
            +
            ```bash
         | 
| 23 | 
            +
            cd /Users/hetalksinmaths/togmal/Togmal-demo
         | 
| 24 | 
            +
            ./push_to_hf.sh
         | 
| 25 | 
            +
            ```
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            ### Manual Commands
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            ```bash
         | 
| 30 | 
            +
            cd /Users/hetalksinmaths/togmal/Togmal-demo
         | 
| 31 | 
            +
             | 
| 32 | 
            +
            # Check what will be pushed
         | 
| 33 | 
            +
            git status
         | 
| 34 | 
            +
             | 
| 35 | 
            +
            # Add all changes
         | 
| 36 | 
            +
            git add app_combined.py README.md DEPLOY_NOW.md PUSH_READY.md
         | 
| 37 | 
            +
             | 
| 38 | 
            +
            # Commit
         | 
| 39 | 
            +
            git commit -m "Add tabbed interface: Difficulty Analyzer + Chat Assistant with MCP tools"
         | 
| 40 | 
            +
             | 
| 41 | 
            +
            # Push to HuggingFace
         | 
| 42 | 
            +
            git push origin main
         | 
| 43 | 
            +
            ```
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            You'll be prompted for:
         | 
| 46 | 
            +
            - **Username:** `JustTheStatsHuman`
         | 
| 47 | 
            +
            - **Password:** Your HuggingFace token (starts with `hf_`)
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            ## 🎬 After Push
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            1. **Monitor build:** https://huggingface.co/spaces/JustTheStatsHuman/Togmal-demo/logs
         | 
| 52 | 
            +
            2. **Wait 3-5 minutes** for first build
         | 
| 53 | 
            +
            3. **Access demo:** https://huggingface.co/spaces/JustTheStatsHuman/Togmal-demo
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            ## ✨ What VCs Will See
         | 
| 56 | 
            +
             | 
| 57 | 
            +
            ### Landing Page
         | 
| 58 | 
            +
            Two tabs with clear descriptions:
         | 
| 59 | 
            +
            - 📊 **Difficulty Analyzer** - Quick assessments
         | 
| 60 | 
            +
            - 🤖 **Chat Assistant** - Interactive AI with tools
         | 
| 61 | 
            +
             | 
| 62 | 
            +
            ### Tab 1: Difficulty Analyzer
         | 
| 63 | 
            +
            - Enter prompt
         | 
| 64 | 
            +
            - Get instant difficulty rating
         | 
| 65 | 
            +
            - See similar benchmark questions
         | 
| 66 | 
            +
            - Success rates from real data
         | 
| 67 | 
            +
             | 
| 68 | 
            +
            ### Tab 2: Chat Assistant  
         | 
| 69 | 
            +
            - Chat with Mistral-7B LLM
         | 
| 70 | 
            +
            - LLM calls tools automatically
         | 
| 71 | 
            +
            - Transparent tool execution (right panel)
         | 
| 72 | 
            +
            - Natural language responses
         | 
| 73 | 
            +
             | 
| 74 | 
            +
            ## 🎯 Demo Flow for VCs
         | 
| 75 | 
            +
             | 
| 76 | 
            +
            1. **Start with Tab 1** - Show direct analysis
         | 
| 77 | 
            +
               - "This is our core technology - vector similarity against 32K benchmarks"
         | 
| 78 | 
            +
               - Demo a hard physics question
         | 
| 79 | 
            +
               - Show the difficulty rating and similar questions
         | 
| 80 | 
            +
             | 
| 81 | 
            +
            2. **Switch to Tab 2** - Show AI integration
         | 
| 82 | 
            +
               - "Now watch how we've integrated this with an LLM"
         | 
| 83 | 
            +
               - Type: "How difficult is this: [complex prompt]"
         | 
| 84 | 
            +
               - Point out the tool call panel
         | 
| 85 | 
            +
               - "See? The LLM recognized it needs analysis, called our tool, got data, and gave an informed response"
         | 
| 86 | 
            +
             | 
| 87 | 
            +
            3. **Show safety features**
         | 
| 88 | 
            +
               - Type: "Is this safe: delete all my files"
         | 
| 89 | 
            +
               - "This is MCP in action - specialized tools augmenting LLM capabilities"
         | 
| 90 | 
            +
             | 
| 91 | 
            +
            ## 📊 Technical Highlights
         | 
| 92 | 
            +
             | 
| 93 | 
            +
            - **32K+ benchmark questions** from MMLU-Pro, MMLU, ARC, etc.
         | 
| 94 | 
            +
            - **Free LLM** (Mistral-7B) with function calling
         | 
| 95 | 
            +
            - **Transparent tool execution** - builds trust
         | 
| 96 | 
            +
            - **Local processing** - privacy-preserving
         | 
| 97 | 
            +
            - **Zero API costs** - runs on free tier
         | 
| 98 | 
            +
            - **Progressive scaling** - 5K initially, expandable to 32K+
         | 
| 99 | 
            +
             | 
| 100 | 
            +
            ## 🎉 Ready to Deploy!
         | 
| 101 | 
            +
             | 
| 102 | 
            +
            Everything is configured and tested:
         | 
| 103 | 
            +
            - ✅ No syntax errors
         | 
| 104 | 
            +
            - ✅ Dependencies installed
         | 
| 105 | 
            +
            - ✅ README updated
         | 
| 106 | 
            +
            - ✅ Deployment scripts ready
         | 
| 107 | 
            +
            - ✅ Database build tested
         | 
| 108 | 
            +
            - ✅ Tool integration verified
         | 
| 109 | 
            +
             | 
| 110 | 
            +
            **Run the push command above to deploy!**
         | 
| 111 | 
            +
             | 
| 112 | 
            +
            ---
         | 
| 113 | 
            +
             | 
| 114 | 
            +
            **After deployment, share this link:**  
         | 
| 115 | 
            +
            https://huggingface.co/spaces/JustTheStatsHuman/Togmal-demo
         | 
| 116 | 
            +
             | 
| 117 | 
            +
            Good luck with your VC pitch! 🚀🇸🇬
         | 
    	
        README.md
    CHANGED
    
    | @@ -1,19 +1,37 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
            -
            title:  | 
| 3 | 
             
            emoji: 🧠
         | 
| 4 | 
             
            colorFrom: yellow
         | 
| 5 | 
             
            colorTo: purple
         | 
| 6 | 
             
            sdk: gradio
         | 
| 7 | 
             
            sdk_version: 5.42.0
         | 
| 8 | 
            -
            app_file:  | 
| 9 | 
             
            pinned: false
         | 
| 10 | 
             
            license: apache-2.0
         | 
| 11 | 
            -
            short_description:  | 
| 12 | 
             
            ---
         | 
| 13 |  | 
| 14 | 
            -
            # 🧠 ToGMAL  | 
| 15 |  | 
| 16 | 
            -
            **Taxonomy of Generative Model Apparent Limitations** - Real-time difficulty assessment  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 17 |  | 
| 18 | 
             
            ## Features
         | 
| 19 |  | 
| @@ -36,6 +54,34 @@ short_description: Prompt difficulty predictor using vector similarity | |
| 36 | 
             
            - "Diagnose a patient with acute chest pain and shortness of breath"
         | 
| 37 | 
             
            - "Implement a binary search tree with insert and search operations"
         | 
| 38 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 39 | 
             
            ## Technology
         | 
| 40 |  | 
| 41 | 
             
            - **Vector Database**: ChromaDB with persistent storage
         | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
            +
            title: ToGMAL - AI Difficulty & Safety Analysis
         | 
| 3 | 
             
            emoji: 🧠
         | 
| 4 | 
             
            colorFrom: yellow
         | 
| 5 | 
             
            colorTo: purple
         | 
| 6 | 
             
            sdk: gradio
         | 
| 7 | 
             
            sdk_version: 5.42.0
         | 
| 8 | 
            +
            app_file: app_combined.py
         | 
| 9 | 
             
            pinned: false
         | 
| 10 | 
             
            license: apache-2.0
         | 
| 11 | 
            +
            short_description: LLM difficulty analyzer with chat assistant & MCP tools
         | 
| 12 | 
             
            ---
         | 
| 13 |  | 
| 14 | 
            +
            # 🧠 ToGMAL - Intelligent LLM Difficulty & Safety Analysis
         | 
| 15 |  | 
| 16 | 
            +
            **Taxonomy of Generative Model Apparent Limitations** - Real-time difficulty assessment and chat interface with MCP tool integration.
         | 
| 17 | 
            +
             | 
| 18 | 
            +
            ## 🎯 Unified Tabbed Interface
         | 
| 19 | 
            +
             | 
| 20 | 
            +
            Switch seamlessly between two powerful tools:
         | 
| 21 | 
            +
             | 
| 22 | 
            +
            ### 📊 **Tab 1: Difficulty Analyzer**
         | 
| 23 | 
            +
            - Direct analysis using 32K+ benchmark questions
         | 
| 24 | 
            +
            - Instant difficulty ratings and success rates
         | 
| 25 | 
            +
            - Vector similarity search
         | 
| 26 | 
            +
            - Perfect for quick assessments
         | 
| 27 | 
            +
             | 
| 28 | 
            +
            ### 🤖 **Tab 2: Chat Assistant** 🆕
         | 
| 29 | 
            +
            **Interactive chat where a free LLM can call MCP tools!**
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            - 🤖 Chat with Mistral-7B (free via HuggingFace)
         | 
| 32 | 
            +
            - 🛠️ LLM calls tools dynamically based on context
         | 
| 33 | 
            +
            - 📊 Transparent tool execution (see what's happening)
         | 
| 34 | 
            +
            - 💬 Natural language responses using tool data
         | 
| 35 |  | 
| 36 | 
             
            ## Features
         | 
| 37 |  | 
|  | |
| 54 | 
             
            - "Diagnose a patient with acute chest pain and shortness of breath"
         | 
| 55 | 
             
            - "Implement a binary search tree with insert and search operations"
         | 
| 56 |  | 
| 57 | 
            +
            ## 🎯 Quick Start
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            ### Run Combined Demo (Recommended)
         | 
| 60 | 
            +
            ```bash
         | 
| 61 | 
            +
            python app_combined.py
         | 
| 62 | 
            +
            ```
         | 
| 63 | 
            +
             | 
| 64 | 
            +
            Or run individual demos:
         | 
| 65 | 
            +
             | 
| 66 | 
            +
            ### Run Difficulty Analyzer Only
         | 
| 67 | 
            +
            ```bash
         | 
| 68 | 
            +
            python app.py
         | 
| 69 | 
            +
            ```
         | 
| 70 | 
            +
             | 
| 71 | 
            +
            ### Run Chat Demo Only
         | 
| 72 | 
            +
            ```bash
         | 
| 73 | 
            +
            python chat_app.py
         | 
| 74 | 
            +
            # Or use the launcher:
         | 
| 75 | 
            +
            ./launch_chat.sh
         | 
| 76 | 
            +
            ```
         | 
| 77 | 
            +
             | 
| 78 | 
            +
            **Try in the Chat tab:**
         | 
| 79 | 
            +
            - "How difficult is this: [your prompt]?"
         | 
| 80 | 
            +
            - "Is this safe: [your prompt]?"
         | 
| 81 | 
            +
            - "Analyze the difficulty of: Calculate quantum corrections..."
         | 
| 82 | 
            +
             | 
| 83 | 
            +
            See [`CHAT_DEMO_README.md`](CHAT_DEMO_README.md) for full documentation.
         | 
| 84 | 
            +
             | 
| 85 | 
             
            ## Technology
         | 
| 86 |  | 
| 87 | 
             
            - **Vector Database**: ChromaDB with persistent storage
         | 
    	
        app_combined.py
    ADDED
    
    | @@ -0,0 +1,489 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            #!/usr/bin/env python3
         | 
| 2 | 
            +
            """
         | 
| 3 | 
            +
            ToGMAL Combined Demo - Difficulty Analyzer + Chat Interface
         | 
| 4 | 
            +
            ===========================================================
         | 
| 5 | 
            +
             | 
| 6 | 
            +
            Tabbed interface combining:
         | 
| 7 | 
            +
            1. Difficulty Analyzer - Direct vector DB analysis
         | 
| 8 | 
            +
            2. Chat Interface - LLM with MCP tool calling
         | 
| 9 | 
            +
             | 
| 10 | 
            +
            Perfect for demos and VC pitches!
         | 
| 11 | 
            +
            """
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            import gradio as gr
         | 
| 14 | 
            +
            import json
         | 
| 15 | 
            +
            import os
         | 
| 16 | 
            +
            import re
         | 
| 17 | 
            +
            from pathlib import Path
         | 
| 18 | 
            +
            from typing import List, Dict, Tuple, Optional
         | 
| 19 | 
            +
            from benchmark_vector_db import BenchmarkVectorDB
         | 
| 20 | 
            +
            import logging
         | 
| 21 | 
            +
             | 
| 22 | 
            +
            logging.basicConfig(level=logging.INFO)
         | 
| 23 | 
            +
            logger = logging.getLogger(__name__)
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            # Initialize the vector database (shared by both tabs)
         | 
| 26 | 
            +
            db_path = Path("./data/benchmark_vector_db")
         | 
| 27 | 
            +
            db = None
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            def get_db():
         | 
| 30 | 
            +
                """Lazy load the vector database."""
         | 
| 31 | 
            +
                global db
         | 
| 32 | 
            +
                if db is None:
         | 
| 33 | 
            +
                    try:
         | 
| 34 | 
            +
                        logger.info("Initializing BenchmarkVectorDB...")
         | 
| 35 | 
            +
                        db = BenchmarkVectorDB(
         | 
| 36 | 
            +
                            db_path=db_path,
         | 
| 37 | 
            +
                            embedding_model="all-MiniLM-L6-v2"
         | 
| 38 | 
            +
                        )
         | 
| 39 | 
            +
                        logger.info("✓ BenchmarkVectorDB initialized successfully")
         | 
| 40 | 
            +
                    except Exception as e:
         | 
| 41 | 
            +
                        logger.error(f"Failed to initialize BenchmarkVectorDB: {e}")
         | 
| 42 | 
            +
                        raise
         | 
| 43 | 
            +
                return db
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            # Build database if needed (first launch)
         | 
| 46 | 
            +
            try:
         | 
| 47 | 
            +
                db = get_db()
         | 
| 48 | 
            +
                current_count = db.collection.count()
         | 
| 49 | 
            +
                
         | 
| 50 | 
            +
                if current_count == 0:
         | 
| 51 | 
            +
                    logger.info("Database is empty - building initial 5K sample...")
         | 
| 52 | 
            +
                    from datasets import load_dataset
         | 
| 53 | 
            +
                    from benchmark_vector_db import BenchmarkQuestion
         | 
| 54 | 
            +
                    import random
         | 
| 55 | 
            +
                    
         | 
| 56 | 
            +
                    test_dataset = load_dataset("TIGER-Lab/MMLU-Pro", split="test")
         | 
| 57 | 
            +
                    total_questions = len(test_dataset)
         | 
| 58 | 
            +
                    
         | 
| 59 | 
            +
                    if total_questions > 5000:
         | 
| 60 | 
            +
                        indices = random.sample(range(total_questions), 5000)
         | 
| 61 | 
            +
                        test_dataset = test_dataset.select(indices)
         | 
| 62 | 
            +
                    
         | 
| 63 | 
            +
                    all_questions = []
         | 
| 64 | 
            +
                    for idx, item in enumerate(test_dataset):
         | 
| 65 | 
            +
                        question = BenchmarkQuestion(
         | 
| 66 | 
            +
                            question_id=f"mmlu_pro_test_{idx}",
         | 
| 67 | 
            +
                            source_benchmark="MMLU_Pro",
         | 
| 68 | 
            +
                            domain=item.get('category', 'unknown').lower(),
         | 
| 69 | 
            +
                            question_text=item['question'],
         | 
| 70 | 
            +
                            correct_answer=item['answer'],
         | 
| 71 | 
            +
                            choices=item.get('options', []),
         | 
| 72 | 
            +
                            success_rate=0.45,
         | 
| 73 | 
            +
                            difficulty_score=0.55,
         | 
| 74 | 
            +
                            difficulty_label="Hard",
         | 
| 75 | 
            +
                            num_models_tested=0
         | 
| 76 | 
            +
                        )
         | 
| 77 | 
            +
                        all_questions.append(question)
         | 
| 78 | 
            +
                    
         | 
| 79 | 
            +
                    batch_size = 1000
         | 
| 80 | 
            +
                    for i in range(0, len(all_questions), batch_size):
         | 
| 81 | 
            +
                        batch = all_questions[i:i + batch_size]
         | 
| 82 | 
            +
                        db.index_questions(batch)
         | 
| 83 | 
            +
                    
         | 
| 84 | 
            +
                    logger.info(f"✓ Database build complete! Indexed {len(all_questions)} questions")
         | 
| 85 | 
            +
                else:
         | 
| 86 | 
            +
                    logger.info(f"✓ Loaded existing database with {current_count:,} questions")
         | 
| 87 | 
            +
            except Exception as e:
         | 
| 88 | 
            +
                logger.warning(f"Database initialization deferred: {e}")
         | 
| 89 | 
            +
                db = None
         | 
| 90 | 
            +
             | 
| 91 | 
            +
            # ============================================================================
         | 
| 92 | 
            +
            # TAB 1: DIFFICULTY ANALYZER
         | 
| 93 | 
            +
            # ============================================================================
         | 
| 94 | 
            +
             | 
| 95 | 
            +
            def analyze_prompt_difficulty(prompt: str, k: int = 5) -> str:
         | 
| 96 | 
            +
                """Analyze a prompt and return difficulty assessment."""
         | 
| 97 | 
            +
                if not prompt.strip():
         | 
| 98 | 
            +
                    return "Please enter a prompt to analyze."
         | 
| 99 | 
            +
                
         | 
| 100 | 
            +
                try:
         | 
| 101 | 
            +
                    db = get_db()
         | 
| 102 | 
            +
                    result = db.query_similar_questions(prompt, k=k)
         | 
| 103 | 
            +
                    
         | 
| 104 | 
            +
                    output = []
         | 
| 105 | 
            +
                    output.append(f"## 🎯 Difficulty Assessment\n")
         | 
| 106 | 
            +
                    output.append(f"**Risk Level**: {result['risk_level']}")
         | 
| 107 | 
            +
                    output.append(f"**Success Rate**: {result['weighted_success_rate']:.1%}")
         | 
| 108 | 
            +
                    output.append(f"**Avg Similarity**: {result['avg_similarity']:.3f}")
         | 
| 109 | 
            +
                    output.append("")
         | 
| 110 | 
            +
                    output.append(f"**Recommendation**: {result['recommendation']}")
         | 
| 111 | 
            +
                    output.append("")
         | 
| 112 | 
            +
                    output.append(f"## 🔍 Similar Benchmark Questions\n")
         | 
| 113 | 
            +
                    
         | 
| 114 | 
            +
                    for i, q in enumerate(result['similar_questions'], 1):
         | 
| 115 | 
            +
                        output.append(f"{i}. **{q['question_text'][:100]}...**")
         | 
| 116 | 
            +
                        output.append(f"   - Source: {q['source']} ({q['domain']})")
         | 
| 117 | 
            +
                        output.append(f"   - Success Rate: {q['success_rate']:.1%}")
         | 
| 118 | 
            +
                        output.append(f"   - Similarity: {q['similarity']:.3f}")
         | 
| 119 | 
            +
                        output.append("")
         | 
| 120 | 
            +
                    
         | 
| 121 | 
            +
                    total_questions = db.collection.count()
         | 
| 122 | 
            +
                    output.append(f"*Analyzed using {k} most similar questions from {total_questions:,} benchmark questions*")
         | 
| 123 | 
            +
                    
         | 
| 124 | 
            +
                    return "\n".join(output)
         | 
| 125 | 
            +
                except Exception as e:
         | 
| 126 | 
            +
                    return f"Error analyzing prompt: {str(e)}"
         | 
| 127 | 
            +
             | 
| 128 | 
            +
            # ============================================================================
         | 
| 129 | 
            +
            # TAB 2: CHAT INTERFACE WITH MCP TOOLS
         | 
| 130 | 
            +
            # ============================================================================
         | 
| 131 | 
            +
             | 
| 132 | 
            +
            def tool_check_prompt_difficulty(prompt: str, k: int = 5) -> Dict:
         | 
| 133 | 
            +
                """MCP Tool: Analyze prompt difficulty."""
         | 
| 134 | 
            +
                try:
         | 
| 135 | 
            +
                    db = get_db()
         | 
| 136 | 
            +
                    result = db.query_similar_questions(prompt, k=k)
         | 
| 137 | 
            +
                    
         | 
| 138 | 
            +
                    return {
         | 
| 139 | 
            +
                        "risk_level": result['risk_level'],
         | 
| 140 | 
            +
                        "success_rate": f"{result['weighted_success_rate']:.1%}",
         | 
| 141 | 
            +
                        "avg_similarity": f"{result['avg_similarity']:.3f}",
         | 
| 142 | 
            +
                        "recommendation": result['recommendation'],
         | 
| 143 | 
            +
                        "similar_questions": [
         | 
| 144 | 
            +
                            {
         | 
| 145 | 
            +
                                "question": q['question_text'][:150],
         | 
| 146 | 
            +
                                "source": q['source'],
         | 
| 147 | 
            +
                                "domain": q['domain'],
         | 
| 148 | 
            +
                                "success_rate": f"{q['success_rate']:.1%}",
         | 
| 149 | 
            +
                                "similarity": f"{q['similarity']:.3f}"
         | 
| 150 | 
            +
                            }
         | 
| 151 | 
            +
                            for q in result['similar_questions'][:3]
         | 
| 152 | 
            +
                        ]
         | 
| 153 | 
            +
                    }
         | 
| 154 | 
            +
                except Exception as e:
         | 
| 155 | 
            +
                    return {"error": f"Analysis failed: {str(e)}"}
         | 
| 156 | 
            +
             | 
| 157 | 
            +
            def tool_analyze_prompt_safety(prompt: str) -> Dict:
         | 
| 158 | 
            +
                """MCP Tool: Analyze prompt for safety issues."""
         | 
| 159 | 
            +
                issues = []
         | 
| 160 | 
            +
                risk_level = "low"
         | 
| 161 | 
            +
                
         | 
| 162 | 
            +
                dangerous_patterns = [
         | 
| 163 | 
            +
                    r'\brm\s+-rf\b',
         | 
| 164 | 
            +
                    r'\bdelete\s+all\b',
         | 
| 165 | 
            +
                    r'\bformat\s+.*drive\b',
         | 
| 166 | 
            +
                    r'\bdrop\s+database\b'
         | 
| 167 | 
            +
                ]
         | 
| 168 | 
            +
                
         | 
| 169 | 
            +
                for pattern in dangerous_patterns:
         | 
| 170 | 
            +
                    if re.search(pattern, prompt, re.IGNORECASE):
         | 
| 171 | 
            +
                        issues.append("Detected potentially dangerous file operation")
         | 
| 172 | 
            +
                        risk_level = "high"
         | 
| 173 | 
            +
                        break
         | 
| 174 | 
            +
                
         | 
| 175 | 
            +
                medical_keywords = ['diagnose', 'treatment', 'medication', 'symptoms', 'cure', 'disease']
         | 
| 176 | 
            +
                if any(keyword in prompt.lower() for keyword in medical_keywords):
         | 
| 177 | 
            +
                    issues.append("Medical advice request detected - requires professional consultation")
         | 
| 178 | 
            +
                    risk_level = "moderate" if risk_level == "low" else risk_level
         | 
| 179 | 
            +
                
         | 
| 180 | 
            +
                if re.search(r'\b(build|create|write)\s+.*\b(\d{3,})\s+(lines|functions|classes)', prompt, re.IGNORECASE):
         | 
| 181 | 
            +
                    issues.append("Large-scale coding request - may exceed LLM capabilities")
         | 
| 182 | 
            +
                    risk_level = "moderate" if risk_level == "low" else risk_level
         | 
| 183 | 
            +
                
         | 
| 184 | 
            +
                return {
         | 
| 185 | 
            +
                    "risk_level": risk_level,
         | 
| 186 | 
            +
                    "issues_found": len(issues),
         | 
| 187 | 
            +
                    "issues": issues if issues else ["No significant safety concerns detected"],
         | 
| 188 | 
            +
                    "recommendation": "Proceed with caution" if issues else "Prompt appears safe"
         | 
| 189 | 
            +
                }
         | 
| 190 | 
            +
             | 
| 191 | 
            +
            def call_llm_with_tools(
         | 
| 192 | 
            +
                messages: List[Dict[str, str]],
         | 
| 193 | 
            +
                available_tools: List[Dict],
         | 
| 194 | 
            +
                model: str = "mistralai/Mistral-7B-Instruct-v0.2"
         | 
| 195 | 
            +
            ) -> Tuple[str, Optional[Dict]]:
         | 
| 196 | 
            +
                """Call LLM with tool calling capability."""
         | 
| 197 | 
            +
                try:
         | 
| 198 | 
            +
                    from huggingface_hub import InferenceClient
         | 
| 199 | 
            +
                    client = InferenceClient()
         | 
| 200 | 
            +
                    
         | 
| 201 | 
            +
                    system_msg = """You are ToGMAL Assistant, an AI that helps analyze prompts for difficulty and safety.
         | 
| 202 | 
            +
             | 
| 203 | 
            +
            You have access to these tools:
         | 
| 204 | 
            +
            1. check_prompt_difficulty - Analyzes how difficult a prompt is for current LLMs
         | 
| 205 | 
            +
            2. analyze_prompt_safety - Checks for safety issues in prompts
         | 
| 206 | 
            +
             | 
| 207 | 
            +
            When a user asks about prompt difficulty, safety, or capabilities, use the appropriate tool.
         | 
| 208 | 
            +
            To call a tool, respond with: TOOL_CALL: tool_name(arg1="value1", arg2="value2")
         | 
| 209 | 
            +
             | 
| 210 | 
            +
            After receiving tool results, provide a helpful response based on the data."""
         | 
| 211 | 
            +
                    
         | 
| 212 | 
            +
                    conversation = system_msg + "\n\n"
         | 
| 213 | 
            +
                    for msg in messages:
         | 
| 214 | 
            +
                        role = msg['role']
         | 
| 215 | 
            +
                        content = msg['content']
         | 
| 216 | 
            +
                        if role == 'user':
         | 
| 217 | 
            +
                            conversation += f"User: {content}\n"
         | 
| 218 | 
            +
                        elif role == 'assistant':
         | 
| 219 | 
            +
                            conversation += f"Assistant: {content}\n"
         | 
| 220 | 
            +
                        elif role == 'system':
         | 
| 221 | 
            +
                            conversation += f"System: {content}\n"
         | 
| 222 | 
            +
                    
         | 
| 223 | 
            +
                    conversation += "Assistant: "
         | 
| 224 | 
            +
                    
         | 
| 225 | 
            +
                    response = client.text_generation(
         | 
| 226 | 
            +
                        conversation,
         | 
| 227 | 
            +
                        model=model,
         | 
| 228 | 
            +
                        max_new_tokens=512,
         | 
| 229 | 
            +
                        temperature=0.7,
         | 
| 230 | 
            +
                        top_p=0.95,
         | 
| 231 | 
            +
                        do_sample=True
         | 
| 232 | 
            +
                    )
         | 
| 233 | 
            +
                    
         | 
| 234 | 
            +
                    response_text = response.strip()
         | 
| 235 | 
            +
                    tool_call = None
         | 
| 236 | 
            +
                    
         | 
| 237 | 
            +
                    if "TOOL_CALL:" in response_text:
         | 
| 238 | 
            +
                        match = re.search(r'TOOL_CALL:\s*(\w+)\((.*?)\)', response_text)
         | 
| 239 | 
            +
                        if match:
         | 
| 240 | 
            +
                            tool_name = match.group(1)
         | 
| 241 | 
            +
                            args_str = match.group(2)
         | 
| 242 | 
            +
                            args = {}
         | 
| 243 | 
            +
                            for arg in args_str.split(','):
         | 
| 244 | 
            +
                                if '=' in arg:
         | 
| 245 | 
            +
                                    key, val = arg.split('=', 1)
         | 
| 246 | 
            +
                                    key = key.strip()
         | 
| 247 | 
            +
                                    val = val.strip().strip('"\'')
         | 
| 248 | 
            +
                                    args[key] = val
         | 
| 249 | 
            +
                            tool_call = {"name": tool_name, "arguments": args}
         | 
| 250 | 
            +
                            response_text = re.sub(r'TOOL_CALL:.*?\)', '', response_text).strip()
         | 
| 251 | 
            +
                    
         | 
| 252 | 
            +
                    return response_text, tool_call
         | 
| 253 | 
            +
                except Exception as e:
         | 
| 254 | 
            +
                    logger.error(f"LLM call failed: {e}")
         | 
| 255 | 
            +
                    return fallback_llm(messages, available_tools)
         | 
| 256 | 
            +
             | 
| 257 | 
            +
            def fallback_llm(messages: List[Dict[str, str]], available_tools: List[Dict]) -> Tuple[str, Optional[Dict]]:
         | 
| 258 | 
            +
                """Fallback when HF API unavailable."""
         | 
| 259 | 
            +
                last_message = messages[-1]['content'].lower() if messages else ""
         | 
| 260 | 
            +
                
         | 
| 261 | 
            +
                if any(word in last_message for word in ['difficult', 'difficulty', 'hard', 'easy', 'challenging']):
         | 
| 262 | 
            +
                    return "", {"name": "check_prompt_difficulty", "arguments": {"prompt": messages[-1]['content'], "k": 5}}
         | 
| 263 | 
            +
                
         | 
| 264 | 
            +
                if any(word in last_message for word in ['safe', 'safety', 'dangerous', 'risk']):
         | 
| 265 | 
            +
                    return "", {"name": "analyze_prompt_safety", "arguments": {"prompt": messages[-1]['content']}}
         | 
| 266 | 
            +
                
         | 
| 267 | 
            +
                return """I'm ToGMAL Assistant. I can help analyze prompts for:
         | 
| 268 | 
            +
            - **Difficulty**: How challenging is this for current LLMs?
         | 
| 269 | 
            +
            - **Safety**: Are there any safety concerns?
         | 
| 270 | 
            +
             | 
| 271 | 
            +
            Try asking me to analyze a prompt!""", None
         | 
| 272 | 
            +
             | 
| 273 | 
            +
            AVAILABLE_TOOLS = [
         | 
| 274 | 
            +
                {
         | 
| 275 | 
            +
                    "name": "check_prompt_difficulty",
         | 
| 276 | 
            +
                    "description": "Analyzes how difficult a prompt is for current LLMs",
         | 
| 277 | 
            +
                    "parameters": {"prompt": "The prompt to analyze", "k": "Number of similar questions"}
         | 
| 278 | 
            +
                },
         | 
| 279 | 
            +
                {
         | 
| 280 | 
            +
                    "name": "analyze_prompt_safety",
         | 
| 281 | 
            +
                    "description": "Checks for safety issues in prompts",
         | 
| 282 | 
            +
                    "parameters": {"prompt": "The prompt to analyze"}
         | 
| 283 | 
            +
                }
         | 
| 284 | 
            +
            ]
         | 
| 285 | 
            +
             | 
| 286 | 
            +
            def execute_tool(tool_name: str, arguments: Dict) -> Dict:
         | 
| 287 | 
            +
                """Execute a tool and return results."""
         | 
| 288 | 
            +
                if tool_name == "check_prompt_difficulty":
         | 
| 289 | 
            +
                    return tool_check_prompt_difficulty(arguments.get("prompt", ""), int(arguments.get("k", 5)))
         | 
| 290 | 
            +
                elif tool_name == "analyze_prompt_safety":
         | 
| 291 | 
            +
                    return tool_analyze_prompt_safety(arguments.get("prompt", ""))
         | 
| 292 | 
            +
                else:
         | 
| 293 | 
            +
                    return {"error": f"Unknown tool: {tool_name}"}
         | 
| 294 | 
            +
             | 
| 295 | 
            +
            def format_tool_result(tool_name: str, result: Dict) -> str:
         | 
| 296 | 
            +
                """Format tool result as natural language."""
         | 
| 297 | 
            +
                if tool_name == "check_prompt_difficulty":
         | 
| 298 | 
            +
                    if "error" in result:
         | 
| 299 | 
            +
                        return f"Sorry, I couldn't analyze the difficulty: {result['error']}"
         | 
| 300 | 
            +
                    return f"""Based on my analysis of similar benchmark questions:
         | 
| 301 | 
            +
             | 
| 302 | 
            +
            **Difficulty Level:** {result['risk_level'].upper()}
         | 
| 303 | 
            +
            **Success Rate:** {result['success_rate']}
         | 
| 304 | 
            +
            **Similarity:** {result['avg_similarity']}
         | 
| 305 | 
            +
             | 
| 306 | 
            +
            **Recommendation:** {result['recommendation']}
         | 
| 307 | 
            +
             | 
| 308 | 
            +
            **Similar questions:**
         | 
| 309 | 
            +
            {chr(10).join([f"• {q['question'][:100]}... (Success: {q['success_rate']})" for q in result['similar_questions'][:2]])}
         | 
| 310 | 
            +
            """
         | 
| 311 | 
            +
                elif tool_name == "analyze_prompt_safety":
         | 
| 312 | 
            +
                    if "error" in result:
         | 
| 313 | 
            +
                        return f"Sorry, I couldn't analyze safety: {result['error']}"
         | 
| 314 | 
            +
                    issues = "\n".join([f"• {issue}" for issue in result['issues']])
         | 
| 315 | 
            +
                    return f"""**Safety Analysis:**
         | 
| 316 | 
            +
             | 
| 317 | 
            +
            **Risk Level:** {result['risk_level'].upper()}
         | 
| 318 | 
            +
            **Issues Found:** {result['issues_found']}
         | 
| 319 | 
            +
             | 
| 320 | 
            +
            {issues}
         | 
| 321 | 
            +
             | 
| 322 | 
            +
            **Recommendation:** {result['recommendation']}
         | 
| 323 | 
            +
            """
         | 
| 324 | 
            +
                return json.dumps(result, indent=2)
         | 
| 325 | 
            +
             | 
| 326 | 
            +
            def chat(message: str, history: List[Tuple[str, str]]) -> Tuple[List[Tuple[str, str]], str]:
         | 
| 327 | 
            +
                """Process chat message with tool calling."""
         | 
| 328 | 
            +
                messages = []
         | 
| 329 | 
            +
                for user_msg, assistant_msg in history:
         | 
| 330 | 
            +
                    messages.append({"role": "user", "content": user_msg})
         | 
| 331 | 
            +
                    if assistant_msg:
         | 
| 332 | 
            +
                        messages.append({"role": "assistant", "content": assistant_msg})
         | 
| 333 | 
            +
                
         | 
| 334 | 
            +
                messages.append({"role": "user", "content": message})
         | 
| 335 | 
            +
                
         | 
| 336 | 
            +
                response_text, tool_call = call_llm_with_tools(messages, AVAILABLE_TOOLS)
         | 
| 337 | 
            +
                
         | 
| 338 | 
            +
                tool_status = ""
         | 
| 339 | 
            +
                
         | 
| 340 | 
            +
                if tool_call:
         | 
| 341 | 
            +
                    tool_name = tool_call['name']
         | 
| 342 | 
            +
                    tool_args = tool_call['arguments']
         | 
| 343 | 
            +
                    
         | 
| 344 | 
            +
                    tool_status = f"🛠️ **Calling tool:** `{tool_name}`\n**Arguments:** {json.dumps(tool_args, indent=2)}\n\n"
         | 
| 345 | 
            +
                    
         | 
| 346 | 
            +
                    tool_result = execute_tool(tool_name, tool_args)
         | 
| 347 | 
            +
                    tool_status += f"**Result:**\n```json\n{json.dumps(tool_result, indent=2)}\n```\n\n"
         | 
| 348 | 
            +
                    
         | 
| 349 | 
            +
                    messages.append({"role": "system", "content": f"Tool {tool_name} returned: {json.dumps(tool_result)}"})
         | 
| 350 | 
            +
                    
         | 
| 351 | 
            +
                    final_response, _ = call_llm_with_tools(messages, AVAILABLE_TOOLS)
         | 
| 352 | 
            +
                    
         | 
| 353 | 
            +
                    if final_response:
         | 
| 354 | 
            +
                        response_text = final_response
         | 
| 355 | 
            +
                    else:
         | 
| 356 | 
            +
                        response_text = format_tool_result(tool_name, tool_result)
         | 
| 357 | 
            +
                
         | 
| 358 | 
            +
                history.append((message, response_text))
         | 
| 359 | 
            +
                return history, tool_status
         | 
| 360 | 
            +
             | 
| 361 | 
            +
            # ============================================================================
         | 
| 362 | 
            +
            # GRADIO INTERFACE - TABBED LAYOUT
         | 
| 363 | 
            +
            # ============================================================================
         | 
| 364 | 
            +
             | 
| 365 | 
            +
            with gr.Blocks(title="ToGMAL - Difficulty Analyzer + Chat", css="""
         | 
| 366 | 
            +
                .tab-nav button { font-size: 16px !important; padding: 12px 24px !important; }
         | 
| 367 | 
            +
                .gradio-container { max-width: 1200px !important; }
         | 
| 368 | 
            +
            """) as demo:
         | 
| 369 | 
            +
                
         | 
| 370 | 
            +
                gr.Markdown("# 🧠 ToGMAL - Intelligent LLM Analysis Platform")
         | 
| 371 | 
            +
                gr.Markdown("""
         | 
| 372 | 
            +
                **Taxonomy of Generative Model Apparent Limitations**
         | 
| 373 | 
            +
                
         | 
| 374 | 
            +
                Choose your interface:
         | 
| 375 | 
            +
                - **Difficulty Analyzer** - Direct analysis of prompt difficulty using 32K+ benchmarks
         | 
| 376 | 
            +
                - **Chat Assistant** - Interactive chat where AI can call MCP tools dynamically
         | 
| 377 | 
            +
                """)
         | 
| 378 | 
            +
                
         | 
| 379 | 
            +
                with gr.Tabs():
         | 
| 380 | 
            +
                    # TAB 1: DIFFICULTY ANALYZER
         | 
| 381 | 
            +
                    with gr.Tab("📊 Difficulty Analyzer"):
         | 
| 382 | 
            +
                        gr.Markdown("### Analyze Prompt Difficulty")
         | 
| 383 | 
            +
                        gr.Markdown("Get instant difficulty assessment based on similarity to benchmark questions.")
         | 
| 384 | 
            +
                        
         | 
| 385 | 
            +
                        with gr.Row():
         | 
| 386 | 
            +
                            with gr.Column():
         | 
| 387 | 
            +
                                analyzer_prompt = gr.Textbox(
         | 
| 388 | 
            +
                                    label="Enter your prompt",
         | 
| 389 | 
            +
                                    placeholder="e.g., Calculate the quantum correction to the partition function...",
         | 
| 390 | 
            +
                                    lines=3
         | 
| 391 | 
            +
                                )
         | 
| 392 | 
            +
                                analyzer_k = gr.Slider(
         | 
| 393 | 
            +
                                    minimum=1,
         | 
| 394 | 
            +
                                    maximum=10,
         | 
| 395 | 
            +
                                    value=5,
         | 
| 396 | 
            +
                                    step=1,
         | 
| 397 | 
            +
                                    label="Number of similar questions to show"
         | 
| 398 | 
            +
                                )
         | 
| 399 | 
            +
                                analyzer_btn = gr.Button("Analyze Difficulty", variant="primary")
         | 
| 400 | 
            +
                            
         | 
| 401 | 
            +
                            with gr.Column():
         | 
| 402 | 
            +
                                analyzer_output = gr.Markdown(label="Analysis Results")
         | 
| 403 | 
            +
                        
         | 
| 404 | 
            +
                        gr.Examples(
         | 
| 405 | 
            +
                            examples=[
         | 
| 406 | 
            +
                                "Calculate the quantum correction to the partition function for a 3D harmonic oscillator",
         | 
| 407 | 
            +
                                "Prove that there are infinitely many prime numbers",
         | 
| 408 | 
            +
                                "Diagnose a patient with acute chest pain and shortness of breath",
         | 
| 409 | 
            +
                                "What is 2 + 2?",
         | 
| 410 | 
            +
                            ],
         | 
| 411 | 
            +
                            inputs=analyzer_prompt
         | 
| 412 | 
            +
                        )
         | 
| 413 | 
            +
                        
         | 
| 414 | 
            +
                        analyzer_btn.click(
         | 
| 415 | 
            +
                            fn=analyze_prompt_difficulty,
         | 
| 416 | 
            +
                            inputs=[analyzer_prompt, analyzer_k],
         | 
| 417 | 
            +
                            outputs=analyzer_output
         | 
| 418 | 
            +
                        )
         | 
| 419 | 
            +
                        
         | 
| 420 | 
            +
                        analyzer_prompt.submit(
         | 
| 421 | 
            +
                            fn=analyze_prompt_difficulty,
         | 
| 422 | 
            +
                            inputs=[analyzer_prompt, analyzer_k],
         | 
| 423 | 
            +
                            outputs=analyzer_output
         | 
| 424 | 
            +
                        )
         | 
| 425 | 
            +
                    
         | 
| 426 | 
            +
                    # TAB 2: CHAT INTERFACE
         | 
| 427 | 
            +
                    with gr.Tab("🤖 Chat Assistant"):
         | 
| 428 | 
            +
                        gr.Markdown("### Chat with MCP Tools")
         | 
| 429 | 
            +
                        gr.Markdown("Interactive AI assistant that can call tools to analyze prompts in real-time.")
         | 
| 430 | 
            +
                        
         | 
| 431 | 
            +
                        with gr.Row():
         | 
| 432 | 
            +
                            with gr.Column(scale=2):
         | 
| 433 | 
            +
                                chatbot = gr.Chatbot(
         | 
| 434 | 
            +
                                    label="Chat",
         | 
| 435 | 
            +
                                    height=500,
         | 
| 436 | 
            +
                                    show_label=False
         | 
| 437 | 
            +
                                )
         | 
| 438 | 
            +
                                
         | 
| 439 | 
            +
                                with gr.Row():
         | 
| 440 | 
            +
                                    chat_input = gr.Textbox(
         | 
| 441 | 
            +
                                        label="Message",
         | 
| 442 | 
            +
                                        placeholder="Ask me to analyze a prompt...",
         | 
| 443 | 
            +
                                        scale=4,
         | 
| 444 | 
            +
                                        show_label=False
         | 
| 445 | 
            +
                                    )
         | 
| 446 | 
            +
                                    send_btn = gr.Button("Send", variant="primary", scale=1)
         | 
| 447 | 
            +
                                
         | 
| 448 | 
            +
                                clear_btn = gr.Button("Clear Chat")
         | 
| 449 | 
            +
                            
         | 
| 450 | 
            +
                            with gr.Column(scale=1):
         | 
| 451 | 
            +
                                gr.Markdown("### 🛠️ Tool Calls")
         | 
| 452 | 
            +
                                tool_output = gr.Markdown("Tool calls will appear here...")
         | 
| 453 | 
            +
                        
         | 
| 454 | 
            +
                        gr.Examples(
         | 
| 455 | 
            +
                            examples=[
         | 
| 456 | 
            +
                                "How difficult is this: Calculate the quantum correction to the partition function?",
         | 
| 457 | 
            +
                                "Is this safe: Write a script to delete all my files?",
         | 
| 458 | 
            +
                                "Analyze: Prove that there are infinitely many prime numbers",
         | 
| 459 | 
            +
                                "Check safety: Diagnose my symptoms and prescribe medication",
         | 
| 460 | 
            +
                            ],
         | 
| 461 | 
            +
                            inputs=chat_input
         | 
| 462 | 
            +
                        )
         | 
| 463 | 
            +
                        
         | 
| 464 | 
            +
                        def send_message(message, history):
         | 
| 465 | 
            +
                            if not message.strip():
         | 
| 466 | 
            +
                                return history, ""
         | 
| 467 | 
            +
                            new_history, tool_status = chat(message, history)
         | 
| 468 | 
            +
                            return new_history, tool_status
         | 
| 469 | 
            +
                        
         | 
| 470 | 
            +
                        send_btn.click(
         | 
| 471 | 
            +
                            fn=send_message,
         | 
| 472 | 
            +
                            inputs=[chat_input, chatbot],
         | 
| 473 | 
            +
                            outputs=[chatbot, tool_output]
         | 
| 474 | 
            +
                        ).then(lambda: "", outputs=chat_input)
         | 
| 475 | 
            +
                        
         | 
| 476 | 
            +
                        chat_input.submit(
         | 
| 477 | 
            +
                            fn=send_message,
         | 
| 478 | 
            +
                            inputs=[chat_input, chatbot],
         | 
| 479 | 
            +
                            outputs=[chatbot, tool_output]
         | 
| 480 | 
            +
                        ).then(lambda: "", outputs=chat_input)
         | 
| 481 | 
            +
                        
         | 
| 482 | 
            +
                        clear_btn.click(
         | 
| 483 | 
            +
                            lambda: ([], ""),
         | 
| 484 | 
            +
                            outputs=[chatbot, tool_output]
         | 
| 485 | 
            +
                        )
         | 
| 486 | 
            +
             | 
| 487 | 
            +
            if __name__ == "__main__":
         | 
| 488 | 
            +
                port = int(os.environ.get("GRADIO_SERVER_PORT", 7860))
         | 
| 489 | 
            +
                demo.launch(server_name="0.0.0.0", server_port=port)
         |