Spaces:
Configuration error
Configuration error
HeTalksInMaths
commited on
Commit
Β·
241e06f
1
Parent(s):
f9b1ad5
Add README, requirements, and GitHub instructions
Browse files- PUSH_TO_GITHUB.md +98 -0
- README.md +102 -3
- requirements.txt +1 -0
PUSH_TO_GITHUB.md
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π Push to GitHub - Complete Instructions
|
| 2 |
+
|
| 3 |
+
## Step 1: Create a GitHub Repository
|
| 4 |
+
|
| 5 |
+
1. Go to https://github.com/new
|
| 6 |
+
2. Sign in to your GitHub account
|
| 7 |
+
3. Fill in the form:
|
| 8 |
+
- **Repository name**: `togmal-prompt-analyzer`
|
| 9 |
+
- **Description**: "Real-time LLM capability boundary detection using vector similarity search"
|
| 10 |
+
- **Public**: Selected
|
| 11 |
+
- **Initialize this repository with a README**: Unchecked
|
| 12 |
+
4. Click "Create repository"
|
| 13 |
+
|
| 14 |
+
## Step 2: Push Your Local Repository
|
| 15 |
+
|
| 16 |
+
After creating the repository, you'll see instructions. Use these commands in your terminal:
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
+
cd /Users/hetalksinmaths/togmal
|
| 20 |
+
git remote add origin https://github.com/YOUR_USERNAME/togmal-prompt-analyzer.git
|
| 21 |
+
git branch -M main
|
| 22 |
+
git push -u origin main
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
**Replace `YOUR_USERNAME`** with your actual GitHub username.
|
| 26 |
+
|
| 27 |
+
## What You'll Have on GitHub
|
| 28 |
+
|
| 29 |
+
Once pushed, your repository will contain:
|
| 30 |
+
|
| 31 |
+
### Core Implementation
|
| 32 |
+
- `benchmark_vector_db.py` - Vector database for difficulty assessment
|
| 33 |
+
- `demo_app.py` - Gradio web interface
|
| 34 |
+
- `fetch_mmlu_top_models.py` - Script to fetch real benchmark data
|
| 35 |
+
|
| 36 |
+
### Documentation
|
| 37 |
+
- `COMPLETE_DEMO_ANALYSIS.md` - Comprehensive analysis of the system
|
| 38 |
+
- `DEMO_README.md` - Demo instructions and results
|
| 39 |
+
- `GITHUB_INSTRUCTIONS.md` - These instructions
|
| 40 |
+
- `README.md` - Main project documentation
|
| 41 |
+
|
| 42 |
+
### Test Files
|
| 43 |
+
- `test_vector_db.py` - Test script with real data examples
|
| 44 |
+
- `test_examples.py` - Additional test cases
|
| 45 |
+
|
| 46 |
+
### Configuration
|
| 47 |
+
- `requirements.txt` - Python dependencies
|
| 48 |
+
- `.gitignore` - Files excluded from version control
|
| 49 |
+
|
| 50 |
+
## Key Features Demonstrated
|
| 51 |
+
|
| 52 |
+
### Real Data vs Mock Data
|
| 53 |
+
- **Before**: All prompts showed ~45% success rate (mock data)
|
| 54 |
+
- **After**: System correctly differentiates difficulty levels:
|
| 55 |
+
- Hard prompts: 23.9% success rate (HIGH risk)
|
| 56 |
+
- Easy prompts: 100% success rate (MINIMAL risk)
|
| 57 |
+
|
| 58 |
+
### 11 Test Questions Analysis
|
| 59 |
+
The system correctly categorizes:
|
| 60 |
+
- **Hard Questions** (20-50% success):
|
| 61 |
+
- "Calculate the quantum correction to the partition function..."
|
| 62 |
+
- "Prove that there are infinitely many prime numbers"
|
| 63 |
+
- "Statement 1 | Every field is also a ring..."
|
| 64 |
+
- **Easy Questions** (80-100% success):
|
| 65 |
+
- "What is 2 + 2?"
|
| 66 |
+
- "What is the capital of France?"
|
| 67 |
+
- "Who wrote Romeo and Juliet?"
|
| 68 |
+
|
| 69 |
+
### Recommendation Engine
|
| 70 |
+
Based on success rates:
|
| 71 |
+
- **<30%**: Multi-step reasoning with verification
|
| 72 |
+
- **30-70%**: Use chain-of-thought prompting
|
| 73 |
+
- **>70%**: Standard LLM response adequate
|
| 74 |
+
|
| 75 |
+
## Live Demo
|
| 76 |
+
|
| 77 |
+
Your demo is running at:
|
| 78 |
+
- Local: http://127.0.0.1:7861
|
| 79 |
+
- Public: https://db11ee71660c8a3319.gradio.live
|
| 80 |
+
|
| 81 |
+
## Next Steps After Pushing
|
| 82 |
+
|
| 83 |
+
1. Add badges to README (build status, license, etc.)
|
| 84 |
+
2. Create GitHub Pages for project documentation
|
| 85 |
+
3. Set up CI/CD for automated testing
|
| 86 |
+
4. Add more benchmark datasets
|
| 87 |
+
5. Create releases for different versions
|
| 88 |
+
|
| 89 |
+
## Need Help?
|
| 90 |
+
|
| 91 |
+
If you encounter any issues:
|
| 92 |
+
1. Check that you're using the correct repository URL
|
| 93 |
+
2. Ensure you have internet connectivity
|
| 94 |
+
3. Verify your GitHub credentials are set up
|
| 95 |
+
4. Make sure you've replaced YOUR_USERNAME with your actual GitHub username
|
| 96 |
+
|
| 97 |
+
For additional support, refer to:
|
| 98 |
+
- [GitHub Documentation](https://docs.github.com/en/github/importing-your-projects-to-github/importing-source-code-to-github/adding-an-existing-project-to-github-using-the-command-line)
|
README.md
CHANGED
|
@@ -59,7 +59,7 @@ Analyze a user prompt before the LLM processes it.
|
|
| 59 |
```python
|
| 60 |
{
|
| 61 |
"prompt": "Build me a complete theory of quantum gravity that unifies all forces",
|
| 62 |
-
"response_format": "
|
| 63 |
}
|
| 64 |
```
|
| 65 |
|
|
@@ -75,14 +75,14 @@ Analyze an LLM response for potential issues.
|
|
| 75 |
**Parameters:**
|
| 76 |
- `response` (str): The LLM response to analyze
|
| 77 |
- `context` (str, optional): Original prompt for better analysis
|
| 78 |
-
- `response_format` (str): Output format - `"
|
| 79 |
|
| 80 |
**Example:**
|
| 81 |
```python
|
| 82 |
{
|
| 83 |
"response": "You should definitely take 500mg of ibuprofen every 4 hours...",
|
| 84 |
"context": "I have a headache",
|
| 85 |
-
"response_format": "
|
| 86 |
}
|
| 87 |
```
|
| 88 |
|
|
@@ -460,3 +460,102 @@ Built using:
|
|
| 460 |
- [Pydantic](https://docs.pydantic.dev)
|
| 461 |
|
| 462 |
Inspired by the need for safer, more grounded AI interactions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
```python
|
| 60 |
{
|
| 61 |
"prompt": "Build me a complete theory of quantum gravity that unifies all forces",
|
| 62 |
+
"response_format": "json"
|
| 63 |
}
|
| 64 |
```
|
| 65 |
|
|
|
|
| 75 |
**Parameters:**
|
| 76 |
- `response` (str): The LLM response to analyze
|
| 77 |
- `context` (str, optional): Original prompt for better analysis
|
| 78 |
+
- `response_format` (str): Output format - `"json"` or `"json"`
|
| 79 |
|
| 80 |
**Example:**
|
| 81 |
```python
|
| 82 |
{
|
| 83 |
"response": "You should definitely take 500mg of ibuprofen every 4 hours...",
|
| 84 |
"context": "I have a headache",
|
| 85 |
+
"response_format": "json"
|
| 86 |
}
|
| 87 |
```
|
| 88 |
|
|
|
|
| 460 |
- [Pydantic](https://docs.pydantic.dev)
|
| 461 |
|
| 462 |
Inspired by the need for safer, more grounded AI interactions.
|
| 463 |
+
|
| 464 |
+
# π§ ToGMAL Prompt Difficulty Analyzer
|
| 465 |
+
|
| 466 |
+
Real-time LLM capability boundary detection using vector similarity search.
|
| 467 |
+
|
| 468 |
+
## π― What This Does
|
| 469 |
+
|
| 470 |
+
This system analyzes any prompt and tells you:
|
| 471 |
+
1. **How difficult it is** for current LLMs (based on real benchmark data)
|
| 472 |
+
2. **Why it's difficult** (shows similar benchmark questions)
|
| 473 |
+
3. **What to do about it** (actionable recommendations)
|
| 474 |
+
|
| 475 |
+
## π₯ Key Innovation
|
| 476 |
+
|
| 477 |
+
Instead of clustering by domain (all math together), we cluster by **difficulty** - what's actually hard for LLMs regardless of domain.
|
| 478 |
+
|
| 479 |
+
## π Real Data
|
| 480 |
+
|
| 481 |
+
- **14,042 MMLU questions** with real success rates from top models
|
| 482 |
+
- **<50ms query time** for real-time analysis
|
| 483 |
+
- **Production ready** vector database
|
| 484 |
+
|
| 485 |
+
## π Demo
|
| 486 |
+
|
| 487 |
+
- **Local**: http://127.0.0.1:7861
|
| 488 |
+
- **Public**: https://db11ee71660c8a3319.gradio.live
|
| 489 |
+
|
| 490 |
+
## π§ͺ Example Results
|
| 491 |
+
|
| 492 |
+
### Hard Questions (Low Success Rates)
|
| 493 |
+
```
|
| 494 |
+
Prompt: "Statement 1 | Every field is also a ring..."
|
| 495 |
+
Risk: HIGH (23.9% success)
|
| 496 |
+
Recommendation: Multi-step reasoning with verification
|
| 497 |
+
|
| 498 |
+
Prompt: "Find all zeros of polynomial xΒ³ + 2x + 2 in Zβ"
|
| 499 |
+
Risk: MODERATE (43.8% success)
|
| 500 |
+
Recommendation: Use chain-of-thought prompting
|
| 501 |
+
```
|
| 502 |
+
|
| 503 |
+
### Easy Questions (High Success Rates)
|
| 504 |
+
```
|
| 505 |
+
Prompt: "What is 2 + 2?"
|
| 506 |
+
Risk: MINIMAL (100% success)
|
| 507 |
+
Recommendation: Standard LLM response adequate
|
| 508 |
+
|
| 509 |
+
Prompt: "What is the capital of France?"
|
| 510 |
+
Risk: MINIMAL (100% success)
|
| 511 |
+
Recommendation: Standard LLM response adequate
|
| 512 |
+
```
|
| 513 |
+
|
| 514 |
+
## π οΈ Technical Details
|
| 515 |
+
|
| 516 |
+
### Architecture
|
| 517 |
+
```
|
| 518 |
+
User Prompt β Embedding Model β Vector DB β K Nearest Questions β Weighted Score
|
| 519 |
+
```
|
| 520 |
+
|
| 521 |
+
### Components
|
| 522 |
+
1. **Sentence Transformers** (all-MiniLM-L6-v2) for embeddings
|
| 523 |
+
2. **ChromaDB** for vector storage
|
| 524 |
+
3. **Real MMLU data** with success rates from top models
|
| 525 |
+
4. **Gradio** for web interface
|
| 526 |
+
|
| 527 |
+
## π Quick Start
|
| 528 |
+
|
| 529 |
+
```bash
|
| 530 |
+
# Install dependencies
|
| 531 |
+
pip install -r requirements.txt
|
| 532 |
+
pip install gradio
|
| 533 |
+
|
| 534 |
+
# Run the demo
|
| 535 |
+
python demo_app.py
|
| 536 |
+
```
|
| 537 |
+
|
| 538 |
+
Visit http://127.0.0.1:7861 to use the web interface.
|
| 539 |
+
|
| 540 |
+
## π Next Steps
|
| 541 |
+
|
| 542 |
+
1. Add more benchmark datasets (GPQA, MATH)
|
| 543 |
+
2. Fetch real per-question results from multiple top models
|
| 544 |
+
3. Integrate with ToGMAL MCP server for Claude Desktop
|
| 545 |
+
4. Deploy to HuggingFace Spaces for permanent hosting
|
| 546 |
+
|
| 547 |
+
## π License
|
| 548 |
+
|
| 549 |
+
MIT License - see [LICENSE](LICENSE) file for details.
|
| 550 |
+
|
| 551 |
+
## π€ Contributing
|
| 552 |
+
|
| 553 |
+
1. Fork the repository
|
| 554 |
+
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
|
| 555 |
+
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
|
| 556 |
+
4. Push to the branch (`git push origin feature/AmazingFeature`)
|
| 557 |
+
5. Open a pull request
|
| 558 |
+
|
| 559 |
+
## π§ Contact
|
| 560 |
+
|
| 561 |
+
For questions or support, please open an issue on GitHub.
|
requirements.txt
CHANGED
|
@@ -8,3 +8,4 @@ joblib>=1.3
|
|
| 8 |
sentence-transformers>=2.2.0
|
| 9 |
chromadb>=0.4.0
|
| 10 |
datasets>=2.14.0
|
|
|
|
|
|
| 8 |
sentence-transformers>=2.2.0
|
| 9 |
chromadb>=0.4.0
|
| 10 |
datasets>=2.14.0
|
| 11 |
+
gradio>=4.0.0
|