DeepSeek-R1-Distill-Llama-8B SpectralPO/DeepSeek-R1-Distill-Llama-8B-SPO 8B • Updated May 18, 2025 • 3 SpectralPO/DeepSeek-R1-Distill-Llama-8B-GRPO 8B • Updated May 18, 2025 • 7
Qwen2.5-32B-Instruct SpectralPO/Qwen2.5-32B-Instruct-SPO 33B • Updated May 13, 2025 • 4 SpectralPO/Qwen2.5-32B-Instruct-GRPO 33B • Updated May 13, 2025 • 5
DeepSeek-R1-Distill-Qwen-7B SpectralPO/DeepSeek-R1-Distill-Qwen-7B-GRPO 8B • Updated May 2, 2025 • 5 SpectralPO/DeepSeek-R1-Distill-Qwen-7B-SPO 8B • Updated May 2, 2025 • 5
Offline RL with Neg Samples SpectralPO/s1K-7B-RSPO-neg 8B • Updated Apr 12, 2025 • 5 SpectralPO/Qwen2.5-32B-Instruct-neg 33B • Updated Apr 13, 2025 • 5 SpectralPO/Qwen2.5-14B-Instruct-neg 15B • Updated Apr 18, 2025 • 6
DeepSeek-R1-Distill-Qwen-32B SpectralPO/DeepSeek-R1-Distill-Qwen-32B-SPO Updated Jul 19, 2025 SpectralPO/DeepSeek-R1-Distill-Qwen-32B-GRPO Updated Jul 19, 2025
Qwen2.5-14B-Instruct SpectralPO/Qwen2.5-14B-Instruct-GRPO 15B • Updated May 9, 2025 • 4 SpectralPO/Qwen2.5-14B-Instruct-SPO 15B • Updated May 9, 2025 • 4
Qwen2.5-7B-Instruct SpectralPO/Qwen2.5-7B-Instruct-GRPO 8B • Updated Apr 27, 2025 • 7 SpectralPO/Qwen2.5-7B-Instruct-SPO 8B • Updated Apr 27, 2025 • 4
DeepSeek-R1-Distill-Llama-8B SpectralPO/DeepSeek-R1-Distill-Llama-8B-SPO 8B • Updated May 18, 2025 • 3 SpectralPO/DeepSeek-R1-Distill-Llama-8B-GRPO 8B • Updated May 18, 2025 • 7
DeepSeek-R1-Distill-Qwen-32B SpectralPO/DeepSeek-R1-Distill-Qwen-32B-SPO Updated Jul 19, 2025 SpectralPO/DeepSeek-R1-Distill-Qwen-32B-GRPO Updated Jul 19, 2025
Qwen2.5-32B-Instruct SpectralPO/Qwen2.5-32B-Instruct-SPO 33B • Updated May 13, 2025 • 4 SpectralPO/Qwen2.5-32B-Instruct-GRPO 33B • Updated May 13, 2025 • 5
Qwen2.5-14B-Instruct SpectralPO/Qwen2.5-14B-Instruct-GRPO 15B • Updated May 9, 2025 • 4 SpectralPO/Qwen2.5-14B-Instruct-SPO 15B • Updated May 9, 2025 • 4
DeepSeek-R1-Distill-Qwen-7B SpectralPO/DeepSeek-R1-Distill-Qwen-7B-GRPO 8B • Updated May 2, 2025 • 5 SpectralPO/DeepSeek-R1-Distill-Qwen-7B-SPO 8B • Updated May 2, 2025 • 5
Qwen2.5-7B-Instruct SpectralPO/Qwen2.5-7B-Instruct-GRPO 8B • Updated Apr 27, 2025 • 7 SpectralPO/Qwen2.5-7B-Instruct-SPO 8B • Updated Apr 27, 2025 • 4
Offline RL with Neg Samples SpectralPO/s1K-7B-RSPO-neg 8B • Updated Apr 12, 2025 • 5 SpectralPO/Qwen2.5-32B-Instruct-neg 33B • Updated Apr 13, 2025 • 5 SpectralPO/Qwen2.5-14B-Instruct-neg 15B • Updated Apr 18, 2025 • 6