Spaces:
Runtime error
Runtime error
Update app.py
Browse filesadd arxiv link
app.py
CHANGED
|
@@ -36,7 +36,7 @@ st.markdown(
|
|
| 36 |
)
|
| 37 |
|
| 38 |
st.markdown(
|
| 39 |
-
"We are excited to share the BenchBench-Leaderboard, a crucial component of our comprehensive research on Benchmark Agreement Testing (BAT) [work](
|
| 40 |
"This leaderboard is a meta-benchmark that ranks benchmarks based on their agreement with the crowd harnessing many different references. "
|
| 41 |
)
|
| 42 |
|
|
|
|
| 36 |
)
|
| 37 |
|
| 38 |
st.markdown(
|
| 39 |
+
"We are excited to share the BenchBench-Leaderboard, a crucial component of our comprehensive research on Benchmark Agreement Testing (BAT) [work](https://arxiv.org/abs/2407.13696) -- Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation."
|
| 40 |
"This leaderboard is a meta-benchmark that ranks benchmarks based on their agreement with the crowd harnessing many different references. "
|
| 41 |
)
|
| 42 |
|