When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published 23 days ago • 108
HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds Paper • 2508.12782 • Published Aug 18 • 25
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Jul 29 • 190
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 131
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization Paper • 2507.12142 • Published Jul 16 • 36
T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper • 2507.05964 • Published Jul 8 • 118
Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback Paper • 2507.02321 • Published Jul 3 • 39
Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models Paper • 2506.19103 • Published Jun 23 • 42
Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments Paper • 2407.09287 • Published Jul 12, 2024
Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning Paper • 2310.01207 • Published Oct 2, 2023
Decentralized Monte Carlo Tree Search for Partially Observable Multi-agent Pathfinding Paper • 2312.15908 • Published Dec 26, 2023
CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World Paper • 2505.11962 • Published May 17 • 1
MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale Paper • 2409.00134 • Published Aug 29, 2024 • 2
IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents Paper • 2407.08898 • Published Jul 12, 2024
Gradual Optimization Learning for Conformational Energy Minimization Paper • 2311.06295 • Published Nov 5, 2023