File size: 517 Bytes
ee90a38
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
---
license: mit
datasets:
- Leo-Dai/dapo-math-17k_dedup
---
# 🧠 Parallel-R1-Unseen_Step_200

> **Mid-Training Checkpoint of Parallel-R1: Towards Parallel Thinking via Reinforcement Learning**  
> Stage: **After 200 RL steps via alternating rewards** — showing the adaptive parallel reasoning ability and serve as structure exploration stage.

This checkpoint aims to help you reproduce experimental results in Section 4.5: Extra Bonus: Parallel Thinking as a Mid-Training Exploration Strategy for RL Training.