Following up on LLaDA 2.0 , the paper is now out on Daily Papers🔥 It has sparked a lot of discussion in the community for showing how discrete diffusion LLMs can scale to 100B and run faster than traditional AR models. LLaDA2.0: Scaling Up Diffusion Language Models to 100B (2512.15745)
✨ Built from real enterprise data (Enron + financial institutions), not synthetic tasks ✨ Tests end-to-end finance workflows ✨ Multimodal & cross-file reasoning ✨ Expert annotated (700+ hours) and genuinely challenging hard
✨ Any-to-Any & World-Model : one step forward to the real world - BAAI Emu 3.5 - Antgroup Ming-flash-omni - HunyuanWorld-Mirror: 3D
Aligning with the “world model” globally
✨ Audio & Speech + Video & Visual: released from entertainment labs to delivery platforms - SoulX-Podcast TTS - LongCat-Audio-Codec & LongCat-Video by Meituan delivery paltform - xiabs DreamOmni 2
✨ 48B total/ 3B active - MIT license ✨ Up to 1M context ✨ 84.3 on RULER (128k) with 3.98× speedup ✨ Hybrid KDA + MLA architecture for peak throughput & quality
✨ Compresses long sequences visually to bypass token limits ✨ Reduces computational and memory costs ✨ Preserves meaning through multimodal encoding ✨ Built on GLM-4.1V-9B-Base