kumitang commited on
Commit
785f414
·
verified ·
1 Parent(s): 8d50cf2

Upload index.html

Browse files
Files changed (1) hide show
  1. index.html +4 -4
index.html CHANGED
@@ -148,8 +148,8 @@
148
 
149
  <!-- Figure 1 -->
150
  <div class="content has-text-centered">
151
- <img src="./static/images/figure_1.png" alt="Reward-guided calibration accelerates binary search" style="max-width: 800px; width: 100%; height: auto; margin: 20px auto; display: block;">
152
- <p class="has-text-justified" style="max-width: 800px; margin: 0 auto;">
153
  <strong>Figure 1. Reward-guided calibration accelerates binary search.</strong>
154
  Left: Increasing per-step noisy reward (inverse-distance signal + noise) lowers average search steps versus vanilla.
155
  Right: Example showing reward guidance converges early; vanilla keeps oscillating.
@@ -184,8 +184,8 @@
184
 
185
  <!-- Figure 2 -->
186
  <div class="content has-text-centered">
187
- <img src="./static/images/figure_2.png" alt="Test-time calibration framework and MATH-500 results" style="max-width: 800px; width: 100%; height: auto; margin: 20px auto; display: block;">
188
- <p class="has-text-justified" style="max-width: 800px; margin: 0 auto;">
189
  <strong>Figure 2. (a) Test-time calibration framework.</strong> With a rollout budget N = N<sub>1</sub> + N<sub>2</sub>, the model first explores by generating and scoring N<sub>1</sub> candidate responses. The model then learns calibration parameters (δ, T) from high-scoring responses, using them to adjust the logits for the remaining N<sub>2</sub> generations. The final answer is selected from all N candidates.
190
  <strong>(b) MATH-500 Results.</strong> CarBoN improves weighted Best-of-N accuracy across four models. For all models, calibrated accuracy at N=64 (orange dash line) matches or exceeds uncalibrated accuracy at N=256, corresponding to up to a 4× reduction in rollout budgets. Notably, with Qwen2.5-Math-1.5B-Instruct at N=64, CarBoN surpasses GPT-4o (red dashed line), while uncalibrated Best-of-N with N=256 does not.
191
  </p>
 
148
 
149
  <!-- Figure 1 -->
150
  <div class="content has-text-centered">
151
+ <img src="./static/images/figure_1.png" alt="Reward-guided calibration accelerates binary search" style="width: 100%; height: auto; margin: 20px 0; display: block;">
152
+ <p class="has-text-justified" style="margin: 0;">
153
  <strong>Figure 1. Reward-guided calibration accelerates binary search.</strong>
154
  Left: Increasing per-step noisy reward (inverse-distance signal + noise) lowers average search steps versus vanilla.
155
  Right: Example showing reward guidance converges early; vanilla keeps oscillating.
 
184
 
185
  <!-- Figure 2 -->
186
  <div class="content has-text-centered">
187
+ <img src="./static/images/figure_2.png" alt="Test-time calibration framework and MATH-500 results" style="width: 100%; height: auto; margin: 20px 0; display: block;">
188
+ <p class="has-text-justified" style="margin: 0;">
189
  <strong>Figure 2. (a) Test-time calibration framework.</strong> With a rollout budget N = N<sub>1</sub> + N<sub>2</sub>, the model first explores by generating and scoring N<sub>1</sub> candidate responses. The model then learns calibration parameters (δ, T) from high-scoring responses, using them to adjust the logits for the remaining N<sub>2</sub> generations. The final answer is selected from all N candidates.
190
  <strong>(b) MATH-500 Results.</strong> CarBoN improves weighted Best-of-N accuracy across four models. For all models, calibrated accuracy at N=64 (orange dash line) matches or exceeds uncalibrated accuracy at N=256, corresponding to up to a 4× reduction in rollout budgets. Notably, with Qwen2.5-Math-1.5B-Instruct at N=64, CarBoN surpasses GPT-4o (red dashed line), while uncalibrated Best-of-N with N=256 does not.
191
  </p>