view article Article Building a Modular Image Processing Server with Gradio and MCP By liangwen12year • May 19 • 3
view article Article LLMGameHub: How We Won the Gradio Agents & MCP Hackathon 2025 By kikikita and 1 other • Jul 28 • 18
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning Paper • 2505.10557 • Published May 15 • 47
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 76
view article Article mem-agent: Equipping LLM Agents with Memory Using RL By driaforall and 1 other • 20 days ago • 32
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • Feb 11 • 79
view article Article Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs By davidberenstein1957 and 1 other • May 7 • 41
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published Aug 20 • 42
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management Paper • 2508.04664 • Published Aug 6 • 13
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Jul 29 • 190
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 131
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents Paper • 2504.13128 • Published Apr 17 • 7
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Paper • 2505.24760 • Published May 30 • 73