YM Qin's picture

8 4

YM Qin

Wakals

·

https://wakals.github.io/

AI & ML interests

Computer Vision, Vision-language Model, Generative Model

Recent Activity

upvoted a paper 5 days ago

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

upvoted a paper 5 days ago

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

updated a model 24 days ago

Wakals/CoVT-LLaVA-13B-depth

View all activity

Organizations

None yet

upvoted 2 papers 5 days ago

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published 11 days ago • 192

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Paper • 2512.04563 • Published 25 days ago • 14

updated 5 models 24 days ago

Wakals/CoVT-LLaVA-13B-depth

13B • Updated 24 days ago • 10 • 2

Wakals/CoVT-7B-seg

8B • Updated 24 days ago • 40 • 1

Wakals/CoVT-7B-depth

8B • Updated 24 days ago • 15 • 2

Wakals/CoVT-7B-seg_depth_dino_edge

8B • Updated 24 days ago • 368 • 2

Wakals/CoVT-7B-seg_depth_dino

8B • Updated 24 days ago • 480 • 2

updated a dataset 24 days ago

Wakals/CoVT-Dataset

Viewer • Updated 24 days ago • 1.17M • 4.7k • 9

authored a paper 29 days ago

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24 • 28

reacted to sanaka87's post with 🔥 about 1 month ago

Post

3319

Excited to share our Unified Multimodal Models new work Reconstruction Alignment (RecA)! 🚀 Just 6 × 80GB A100s × 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!

📄 Paper: https://alphaxiv.org/abs/2509.07295
💻 Code: https://github.com/HorizonWind2004/reconstruction-alignment
🤗 HF Models: sanaka87/reca-68ad2176380355a3dcedc068
✍️ DEMO: sanaka87/BAGEL-RecA
🌐 Project Page: https://reconstruction-alignment.github.io
🔥 X: https://x.com/XDWang101/status/1965908302581420204
📰 Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814
🤗 HF Daily Paper: Reconstruction Alignment Improves Unified Multimodal Models (2509.07295)

⚡ <10k images & 27 GPU hours (no-arch-changes) → SOTA, surpassing much larger open-source & private models:

📊 GenEval: 0.73 → 0.90 | 📊 DPGBench: 80.93 → 88.15
🖼️ ImgEdit: 3.38 → 3.75 | 🖌️ GEdit: 6.94 → 7.25

✅ RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings → big gains in image generation 🎨 & editing ✂️.

updated a collection about 1 month ago

CoVT: Chain-of-Visual-Thought

Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25 • 6

upvoted a paper about 1 month ago

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24 • 28

updated a collection about 1 month ago

CoVT: Chain-of-Visual-Thought

Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25 • 6

liked 3 models about 1 month ago

Wakals/CoVT-7B-depth

8B • Updated 24 days ago • 15 • 2

Wakals/CoVT-7B-seg_depth_dino_edge

8B • Updated 24 days ago • 368 • 2

Wakals/CoVT-7B-seg_depth_dino

8B • Updated 24 days ago • 480 • 2

liked a dataset about 1 month ago

Wakals/CoVT-Dataset

Viewer • Updated 24 days ago • 1.17M • 4.7k • 9

published a dataset about 1 month ago

Wakals/CoVT-Dataset

Viewer • Updated 24 days ago • 1.17M • 4.7k • 9

upvoted a collection about 1 month ago

RecA

Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! • 8 items • Updated Sep 22 • 14

updated a collection about 1 month ago

CoVT: Chain-of-Visual-Thought

Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated Nov 25 • 6