deepseek-ai/DeepSeek-R1-Distill-Qwen-32B Text Generation • 33B • Updated Feb 24, 2025 • 2.74M • • 1.48k
Focused Transformer: Contrastive Training for Context Scaling Paper • 2307.03170 • Published Jul 6, 2023 • 11