Spaces:
Running
Running
commit
Browse files
app.py
CHANGED
|
@@ -76,9 +76,12 @@ def main():
|
|
| 76 |
To submit a model for evaluation, please follow these steps:
|
| 77 |
1. **Evaluate your model**:
|
| 78 |
- Follow the evaluation script provided here: [https://github.com/Anania-AI/Arm-LLM-Benchmark](https://github.com/Anania-AI/Arm-LLM-Benchmark)
|
|
|
|
|
|
|
| 79 |
2. **Format your submission file**:
|
| 80 |
-
- After evaluation, you will get a `
|
| 81 |
-
|
|
|
|
| 82 |
{
|
| 83 |
"mmlu_results": [
|
| 84 |
{
|
|
@@ -95,18 +98,9 @@ def main():
|
|
| 95 |
...
|
| 96 |
]
|
| 97 |
}
|
| 98 |
-
|
| 99 |
-
3. **
|
| 100 |
-
-
|
| 101 |
-
- The following categories must be included in the mmlu_results for the model to be considered valid:
|
| 102 |
-
- ```["Biology", "Business", "Chemistry", "Computer Science", "Economics", "Engineering", "Health", "History", "Law", "Math", "Other", "Philosophy", "Physics", "Psychology", "Average"] ```
|
| 103 |
-
- If any of these categories are missing, the model will not be added to the evaluation.
|
| 104 |
-
- For **unified_exam_results**:
|
| 105 |
-
- The following categories must be included in the unified_exam_results for the model to be considered valid:
|
| 106 |
-
- ```["Average", "Armenian language and literature", "Armenian history", "Mathematics"] ```
|
| 107 |
-
- If any of these categories are missing, the model will not be added to the evaluation.
|
| 108 |
-
4. **Submit your model**:
|
| 109 |
-
- Add the `Arm-LLM-Bench` tag and the `result.json` file to your model card.
|
| 110 |
- Click on the "Refresh Data" button in this app, and you will see your model's results.
|
| 111 |
"""
|
| 112 |
)
|
|
|
|
| 76 |
To submit a model for evaluation, please follow these steps:
|
| 77 |
1. **Evaluate your model**:
|
| 78 |
- Follow the evaluation script provided here: [https://github.com/Anania-AI/Arm-LLM-Benchmark](https://github.com/Anania-AI/Arm-LLM-Benchmark)
|
| 79 |
+
- For more details about the evaluation process, read the README in the Arm-LLM-Benchmark GitHub repository.
|
| 80 |
+
|
| 81 |
2. **Format your submission file**:
|
| 82 |
+
- After evaluation, you will get a `results.json` file. Ensure the file follows this format:
|
| 83 |
+
|
| 84 |
+
```json
|
| 85 |
{
|
| 86 |
"mmlu_results": [
|
| 87 |
{
|
|
|
|
| 98 |
...
|
| 99 |
]
|
| 100 |
}
|
| 101 |
+
```
|
| 102 |
+
3. **Submit your model**:
|
| 103 |
+
- Add the `Arm-LLM-Bench` tag and the `results.json` file to your model card.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
- Click on the "Refresh Data" button in this app, and you will see your model's results.
|
| 105 |
"""
|
| 106 |
)
|