Spaces:
Running
Running
Update strings
Browse files- pages/about.py +2 -1
- src/strings.py +9 -2
pages/about.py
CHANGED
|
@@ -8,11 +8,12 @@ ABOUT_LEADERBOARD = """
|
|
| 8 |
|
| 9 |
### π Resources
|
| 10 |
- **Documentation**: [Official docs](https://autogluon.github.io/fev/latest/)
|
|
|
|
| 11 |
- **Source Code**: [GitHub repository](https://github.com/autogluon/fev)
|
| 12 |
- **Issues & Questions**: [GitHub Issues](https://github.com/autogluon/fev/issues)
|
| 13 |
|
| 14 |
### π Submit Your Model
|
| 15 |
-
Ready to add your model to the leaderboard? Follow this [tutorial](https://autogluon.github.io/fev/latest/tutorials/
|
| 16 |
"""
|
| 17 |
st.set_page_config(layout="wide", page_title="About FEV", page_icon=":material/info:")
|
| 18 |
st.markdown(ABOUT_LEADERBOARD)
|
|
|
|
| 8 |
|
| 9 |
### π Resources
|
| 10 |
- **Documentation**: [Official docs](https://autogluon.github.io/fev/latest/)
|
| 11 |
+
- **Publication**: ["fev-bench: A Realistic Benchmark for Time Series Forecasting"](https://arxiv.org/abs/2509.26468)
|
| 12 |
- **Source Code**: [GitHub repository](https://github.com/autogluon/fev)
|
| 13 |
- **Issues & Questions**: [GitHub Issues](https://github.com/autogluon/fev/issues)
|
| 14 |
|
| 15 |
### π Submit Your Model
|
| 16 |
+
Ready to add your model to the leaderboard? Follow this [tutorial](https://autogluon.github.io/fev/latest/tutorials/05-add-your-model/) to evaluate your model with fev and contribute your results.
|
| 17 |
"""
|
| 18 |
st.set_page_config(layout="wide", page_title="About FEV", page_icon=":material/info:")
|
| 19 |
st.markdown(ABOUT_LEADERBOARD)
|
src/strings.py
CHANGED
|
@@ -14,9 +14,13 @@ Model names are colored by type: <span style='color: {COLORS["dl_text"]}; font-w
|
|
| 14 |
|
| 15 |
The full matrix $E_{{rj}}$ with the error of each model $j$ on task $r$ is available at the bottom of the page.
|
| 16 |
|
| 17 |
-
* **Avg. win rate (%)**: Fraction of all possible model pairs and tasks where this model achieves lower error than the competing model. For model $j$, defined as $W_j = \\frac{{1}}{{R(M-1)}} \\sum_{{r=1}}^{{R}} \\sum_{{k \\neq j}} (\\mathbf{{1}}(E_{{rj}} < E_{{rk}}) + 0.5 \\cdot \\mathbf{{1}}(E_{{rj}} = E_{{rk}}))$ where $R$ is number of tasks, $M$ is number of models. Ties count as half-wins.
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
* **Median runtime (s)**: Median end-to-end time (training + prediction across all evaluation windows) in seconds. Note that inference times depend on hardware, batch sizes, and implementation details, so these serve as a rough guide rather than definitive performance benchmarks.
|
| 22 |
|
|
@@ -57,6 +61,9 @@ CITATION_FEV = """
|
|
| 57 |
title={{fev-bench}: A Realistic Benchmark for Time Series Forecasting},
|
| 58 |
author={Shchur, Oleksandr and Ansari, Abdul Fatir and Turkmen, Caner and Stella, Lorenzo and Erickson, Nick and Guerron, Pablo and Bohlke-Schneider, Michael and Wang, Yuyang},
|
| 59 |
year={2025},
|
|
|
|
|
|
|
|
|
|
| 60 |
}
|
| 61 |
```
|
| 62 |
"""
|
|
|
|
| 14 |
|
| 15 |
The full matrix $E_{{rj}}$ with the error of each model $j$ on task $r$ is available at the bottom of the page.
|
| 16 |
|
| 17 |
+
* **Avg. win rate (%)**: Fraction of all possible model pairs and tasks where this model achieves lower error than the competing model. For model $j$, defined as $W_j = \\frac{{1}}{{R(M-1)}} \\sum_{{r=1}}^{{R}} \\sum_{{k \\neq j}} (\\mathbf{{1}}(E_{{rj}} < E_{{rk}}) + 0.5 \\cdot \\mathbf{{1}}(E_{{rj}} = E_{{rk}}))$ where $R$ is number of tasks, $M$ is number of models. Ties count as half-wins.
|
| 18 |
|
| 19 |
+
Ranges from 0% (worst) to 100% (best). Higher values are better. This value changes as new models are added to the benchmark.
|
| 20 |
+
|
| 21 |
+
* **Skill score (%)**: Measures how much the model reduces forecasting error compared to the Seasonal Naive baseline. Computed as $S_j = 100 \\times (1 - \\sqrt[R]{{\\prod_{{r=1}}^{{R}} E_{{rj}}/E_{{r\\beta}}}})$, where $E_{{r\\beta}}$ is baseline error on task $r$. Relative errors are clipped between 0.01 and 100 before aggregation to avoid extreme outliers. Positive values indicate better-than-baseline performance, negative values indicate worse-than-baseline performance.
|
| 22 |
+
|
| 23 |
+
Higher values are better. This value does not change as new models are added to the benchmark.
|
| 24 |
|
| 25 |
* **Median runtime (s)**: Median end-to-end time (training + prediction across all evaluation windows) in seconds. Note that inference times depend on hardware, batch sizes, and implementation details, so these serve as a rough guide rather than definitive performance benchmarks.
|
| 26 |
|
|
|
|
| 61 |
title={{fev-bench}: A Realistic Benchmark for Time Series Forecasting},
|
| 62 |
author={Shchur, Oleksandr and Ansari, Abdul Fatir and Turkmen, Caner and Stella, Lorenzo and Erickson, Nick and Guerron, Pablo and Bohlke-Schneider, Michael and Wang, Yuyang},
|
| 63 |
year={2025},
|
| 64 |
+
eprint={2509.26468},
|
| 65 |
+
archivePrefix={arXiv},
|
| 66 |
+
primaryClass={cs.LG}
|
| 67 |
}
|
| 68 |
```
|
| 69 |
"""
|