JameSand/Llama-3.2-3B-Instruct-muon-2e-2-muonadamlr1e-6-muonadjustlrNone-iter_0000200 Text Generation • 3B • Updated 1 day ago • 9
JameSand/Llama-3.2-3B-Instruct-muon-2e-2-muonadamlr1e-6-muonadjustlrNone-iter_0000200 Text Generation • 3B • Updated 1 day ago • 9
JameSand/Llama-3.2-3B-Instruct-muon-2e-2-muonadamlr1e-6-muonadjustlrrms_norm-iter_0000200 Text Generation • 3B • Updated 1 day ago • 9
JameSand/Llama-3.2-3B-Instruct-muon-2e-2-muonadamlr1e-6-muonadjustlrrms_norm-iter_0000200 Text Generation • 3B • Updated 1 day ago • 9
view reply Hi Seungyoun! Thank you for the nice blog. I am also looking forward to your training scripts. I am also have problems for reproducing the results of Search-R1 Best,James
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following Paper • 2511.21662 • Published Nov 26, 2025 • 11
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models Paper • 2504.04718 • Published Apr 7, 2025 • 42 • 3
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 176