jadohu
/

Qwen3-8B-GRPO

Reinforcement Learning

Model card Files Files and versions

README.md exists but content is empty.

Downloads last month: 7

Safetensors

Model size

8B params

Tensor type

BF16

·

Video Preview

Reinforcement Learning

loading

Model tree for jadohu/Qwen3-8B-GRPO

Base model

Qwen/Qwen3-8B-Base

Finetuned

(255)

this model

Quantizations

1 model

Dataset used to train jadohu/Qwen3-8B-GRPO

Collection including jadohu/Qwen3-8B-GRPO

MASA

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning • 5 items • Updated 2 days ago • 1