ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries Paper • 2511.14349 • Published Nov 18, 2025 • 17
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 10 days ago • 85
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published Oct 22, 2025 • 29
AudioStory: Generating Long-Form Narrative Audio with Large Language Models Paper • 2508.20088 • Published Aug 27, 2025 • 21
Caption Anything: Interactive Image Description with Diverse Multimodal Controls Paper • 2305.02677 • Published May 4, 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning Paper • 2307.16525 • Published Jul 31, 2023
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos Paper • 2411.19772 • Published Nov 29, 2024
TIIF-Bench: How Does Your T2I Model Follow Your Instructions? Paper • 2506.02161 • Published Jun 2, 2025 • 13
TIIF-Bench: How Does Your T2I Model Follow Your Instructions? Paper • 2506.02161 • Published Jun 2, 2025 • 13
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank Paper • 2505.14460 • Published May 20, 2025 • 32
view post Post mamba is now available in transformers. Thanks to @tridao and @albertgu for this brilliant model! 🚀 and the amazing mamba-ssm kernels powering this!Checkout the collection here: state-spaces/transformers-compatible-mamba-65e7b40ab87e5297e45ae406 5 replies · 🤝 10 10 ❤️ 7 7 + Reply