Maozhou Ge's picture

Maozhou Ge

Gmc2

·

GHGmc2

AI & ML interests

None yet

Recent Activity

upvoted a paper 28 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

liked a model 28 days ago

deepseek-ai/DeepSeek-V3.2

upvoted a collection about 2 months ago

View all activity

Organizations

None yet

upvoted a paper 28 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 29 days ago • 93

upvoted a collection about 2 months ago

LLaDA 2.0

7 items • Updated 5 days ago • 39

upvoted an article about 2 months ago

Article

Finetune Stable Diffusion Models with DDPO via TRL

+2

Sep 29, 2023

•

19

upvoted a collection 2 months ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Jul 21 • 671

upvoted an article 2 months ago

Article

Supercharge Edge AI With High‑Accuracy Reasoning Using NVIDIA Nemotron Nano 2 9B

Aug 18

•

31

upvoted 3 collections 2 months ago

InternVL3.5-Core

This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL). • 30 items • Updated Sep 28 • 12

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 6 days ago • 83

Inference Optimized Checkpoints (with Model Optimizer)

A collection of generative models quantized and optimized for inference with Model Optimizer. • 46 items • Updated 6 days ago • 66

upvoted an article 3 months ago

Article

Fixing Gradient Accumulation

+4

Oct 16, 2024

•

63

upvoted a paper 3 months ago

BroRL: Scaling Reinforcement Learning via Broadened Exploration

Paper • 2510.01180 • Published Oct 1 • 18

upvoted a collection 3 months ago

DeepSeek-V3.2

4 items • Updated 29 days ago • 511

upvoted a paper 3 months ago

Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 117

upvoted a collection 3 months ago

Qwen3-VL

37 items • Updated Nov 1 • 547

upvoted an article 5 months ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

Aug 9

•

71

upvoted a paper 5 months ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 316

upvoted a collection 5 months ago

Qwen3

84 items • Updated Aug 6 • 1.53k

upvoted a paper 5 months ago

Pre-Trained Policy Discriminators are General Reward Models

Paper • 2507.05197 • Published Jul 7 • 39

upvoted 2 articles 6 months ago

Article

Mixture of Depth is Vibe

Apr 22, 2024

•

48

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

64

upvoted a paper 8 months ago

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 73