-
Parallel Scaling Law for Language Models
Paper • 2505.10475 • Published • 83 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
Scaling Diffusion Transformers Efficiently via μP
Paper • 2505.15270 • Published • 35 -
Vision Transformers Don't Need Trained Registers
Paper • 2506.08010 • Published • 22
Xiaofan Zhu
Augusteinia
·
AI & ML interests
VLM, RL, Robotics
Organizations
None yet
VLM
-
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 98 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 321 -
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Paper • 2505.11049 • Published • 60 -
Emerging Properties in Unified Multimodal Pretraining
Paper • 2505.14683 • Published • 133
RL thinking
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 24 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 74 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7
Math
-
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Paper • 2505.10557 • Published • 47 -
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Paper • 2505.16400 • Published • 35 -
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
Paper • 2505.15929 • Published • 49 -
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
Paper • 2506.05349 • Published • 24
3DV
Paradigm
-
Parallel Scaling Law for Language Models
Paper • 2505.10475 • Published • 83 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
Scaling Diffusion Transformers Efficiently via μP
Paper • 2505.15270 • Published • 35 -
Vision Transformers Don't Need Trained Registers
Paper • 2506.08010 • Published • 22
Math
-
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Paper • 2505.10557 • Published • 47 -
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Paper • 2505.16400 • Published • 35 -
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
Paper • 2505.15929 • Published • 49 -
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
Paper • 2506.05349 • Published • 24
VLM
-
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 98 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 321 -
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Paper • 2505.11049 • Published • 60 -
Emerging Properties in Unified Multimodal Pretraining
Paper • 2505.14683 • Published • 133
3DV
RL thinking
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 24 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 74 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 120 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7