Upload index.html
Browse files- index.html +4 -4
index.html
CHANGED
|
@@ -148,8 +148,8 @@
|
|
| 148 |
|
| 149 |
<!-- Figure 1 -->
|
| 150 |
<div class="content has-text-centered">
|
| 151 |
-
<img src="./static/images/figure_1.png" alt="Reward-guided calibration accelerates binary search" style="
|
| 152 |
-
<p class="has-text-justified" style="
|
| 153 |
<strong>Figure 1. Reward-guided calibration accelerates binary search.</strong>
|
| 154 |
Left: Increasing per-step noisy reward (inverse-distance signal + noise) lowers average search steps versus vanilla.
|
| 155 |
Right: Example showing reward guidance converges early; vanilla keeps oscillating.
|
|
@@ -184,8 +184,8 @@
|
|
| 184 |
|
| 185 |
<!-- Figure 2 -->
|
| 186 |
<div class="content has-text-centered">
|
| 187 |
-
<img src="./static/images/figure_2.png" alt="Test-time calibration framework and MATH-500 results" style="
|
| 188 |
-
<p class="has-text-justified" style="
|
| 189 |
<strong>Figure 2. (a) Test-time calibration framework.</strong> With a rollout budget N = N<sub>1</sub> + N<sub>2</sub>, the model first explores by generating and scoring N<sub>1</sub> candidate responses. The model then learns calibration parameters (δ, T) from high-scoring responses, using them to adjust the logits for the remaining N<sub>2</sub> generations. The final answer is selected from all N candidates.
|
| 190 |
<strong>(b) MATH-500 Results.</strong> CarBoN improves weighted Best-of-N accuracy across four models. For all models, calibrated accuracy at N=64 (orange dash line) matches or exceeds uncalibrated accuracy at N=256, corresponding to up to a 4× reduction in rollout budgets. Notably, with Qwen2.5-Math-1.5B-Instruct at N=64, CarBoN surpasses GPT-4o (red dashed line), while uncalibrated Best-of-N with N=256 does not.
|
| 191 |
</p>
|
|
|
|
| 148 |
|
| 149 |
<!-- Figure 1 -->
|
| 150 |
<div class="content has-text-centered">
|
| 151 |
+
<img src="./static/images/figure_1.png" alt="Reward-guided calibration accelerates binary search" style="width: 100%; height: auto; margin: 20px 0; display: block;">
|
| 152 |
+
<p class="has-text-justified" style="margin: 0;">
|
| 153 |
<strong>Figure 1. Reward-guided calibration accelerates binary search.</strong>
|
| 154 |
Left: Increasing per-step noisy reward (inverse-distance signal + noise) lowers average search steps versus vanilla.
|
| 155 |
Right: Example showing reward guidance converges early; vanilla keeps oscillating.
|
|
|
|
| 184 |
|
| 185 |
<!-- Figure 2 -->
|
| 186 |
<div class="content has-text-centered">
|
| 187 |
+
<img src="./static/images/figure_2.png" alt="Test-time calibration framework and MATH-500 results" style="width: 100%; height: auto; margin: 20px 0; display: block;">
|
| 188 |
+
<p class="has-text-justified" style="margin: 0;">
|
| 189 |
<strong>Figure 2. (a) Test-time calibration framework.</strong> With a rollout budget N = N<sub>1</sub> + N<sub>2</sub>, the model first explores by generating and scoring N<sub>1</sub> candidate responses. The model then learns calibration parameters (δ, T) from high-scoring responses, using them to adjust the logits for the remaining N<sub>2</sub> generations. The final answer is selected from all N candidates.
|
| 190 |
<strong>(b) MATH-500 Results.</strong> CarBoN improves weighted Best-of-N accuracy across four models. For all models, calibrated accuracy at N=64 (orange dash line) matches or exceeds uncalibrated accuracy at N=256, corresponding to up to a 4× reduction in rollout budgets. Notably, with Qwen2.5-Math-1.5B-Instruct at N=64, CarBoN surpasses GPT-4o (red dashed line), while uncalibrated Best-of-N with N=256 does not.
|
| 191 |
</p>
|