Update constants.py
Browse files- constants.py +1 -1
constants.py
CHANGED
|
@@ -32,7 +32,7 @@ XLSX_DIR = "./file//results.xlsx"
|
|
| 32 |
|
| 33 |
LEADERBOARD_INTRODUCTION = """# ๐ S-Eval Leaderboard
|
| 34 |
## ๐ Updates
|
| 35 |
-
๐ฃ [2025/10/09]:
|
| 36 |
|
| 37 |
๐ฃ [2025/03/30]: ๐ Our [paper](https://dl.acm.org/doi/abs/10.1145/3728971) has been accepted by ISSTA 2025. To meet evaluation needs under different budgets, we partition the benchmark into four scales: [Small](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/small) (1,000 Base and 10,000 Attack in each language), [Medium](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/medium) (3,000 Base and 30,000 Attack in each language), [Large](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/large) (5,000 Base and 50,000 Attack in each language) and [Full](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/full) (10,000 Base and 100,000 Attack in each language), comprehensively considering the balance and harmfulness of data.
|
| 38 |
|
|
|
|
| 32 |
|
| 33 |
LEADERBOARD_INTRODUCTION = """# ๐ S-Eval Leaderboard
|
| 34 |
## ๐ Updates
|
| 35 |
+
๐ฃ [2025/10/09]: We update the evaluation for the latest LLMs in the new [๐ LeaderBoard](https://s.alibaba.com/aigc-web#/), and further release [**Octopus**](https://github.com/Alibaba-AAIG/Octopus), an automated LLM safety evaluator, to meet the communityโs need for accurate and reproducible safety assessment tools. You can download the model from [HuggingFace](https://huggingface.co/Alibaba-AAIG/Octopus-14B) or [ModelScope](https://modelscope.cn/models/Alibaba-AAIG/Octopus-14B/summary).
|
| 36 |
|
| 37 |
๐ฃ [2025/03/30]: ๐ Our [paper](https://dl.acm.org/doi/abs/10.1145/3728971) has been accepted by ISSTA 2025. To meet evaluation needs under different budgets, we partition the benchmark into four scales: [Small](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/small) (1,000 Base and 10,000 Attack in each language), [Medium](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/medium) (3,000 Base and 30,000 Attack in each language), [Large](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/large) (5,000 Base and 50,000 Attack in each language) and [Full](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/full) (10,000 Base and 100,000 Attack in each language), comprehensively considering the balance and harmfulness of data.
|
| 38 |
|