DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published 11 days ago • 192
COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence Paper • 2512.04563 • Published 25 days ago • 14
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published Nov 24 • 28
view post Post 3319 Excited to share our Unified Multimodal Models new work Reconstruction Alignment (RecA)! 🚀 Just 6 × 80GB A100s × 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities! 📄 Paper: https://alphaxiv.org/abs/2509.07295 💻 Code: https://github.com/HorizonWind2004/reconstruction-alignment 🤗 HF Models: sanaka87/reca-68ad2176380355a3dcedc068 ✍️ DEMO: sanaka87/BAGEL-RecA 🌐 Project Page: https://reconstruction-alignment.github.io 🔥 X: https://x.com/XDWang101/status/1965908302581420204 📰 Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814 🤗 HF Daily Paper: Reconstruction Alignment Improves Unified Multimodal Models (2509.07295)⚡ <10k images & 27 GPU hours (no-arch-changes) → SOTA, surpassing much larger open-source & private models:📊 GenEval: 0.73 → 0.90 | 📊 DPGBench: 80.93 → 88.15🖼️ ImgEdit: 3.38 → 3.75 | 🖌️ GEdit: 6.94 → 7.25✅ RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings → big gains in image generation 🎨 & editing ✂️. See translation 🔥 7 7 + Reply
CoVT: Chain-of-Visual-Thought Collection Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25 • 6
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published Nov 24 • 28
CoVT: Chain-of-Visual-Thought Collection Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25 • 6
RecA Collection Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! • 8 items • Updated Sep 22 • 14
CoVT: Chain-of-Visual-Thought Collection Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25 • 6