WebWizard

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

yuexiang96 authored a paper 8 days ago

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

yuexiang96 authored a paper 8 days ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

yuexiang96 authored a paper 8 days ago

Simulating Environments with Reasoning Models for Agent Training

View all activity

yuexiang96

authored 4 papers 8 days ago

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published Oct 28 • 27

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29 • 45

Simulating Environments with Reasoning Models for Agent Training

Paper • 2511.01824 • Published Nov 3 • 2

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published 16 days ago • 35

jeepliu

authored a paper 2 months ago

DocReward: A Document Reward Model for Structuring and Stylizing

Paper • 2510.11391 • Published Oct 13 • 27

jeepliu

authored a paper 5 months ago

Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published Jul 28 • 31

yuexiang96

authored 10 papers 6 months ago

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Paper • 2504.12329 • Published Apr 12

Overtrained Language Models Are Harder to Fine-Tune

Paper • 2503.19206 • Published Mar 24 • 2

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Paper • 2505.10185 • Published May 15 • 26

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Paper • 2506.03930 • Published Jun 4 • 26

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1 • 78

gneubig

authored a paper 7 months ago

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Paper • 2505.10185 • Published May 15 • 26

Solaris99

authored 3 papers 8 months ago

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

Paper • 2404.05955 • Published Apr 9, 2024

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15, 2024 • 24

AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories

Paper • 2410.07706 • Published Oct 10, 2024

AI & ML interests

Recent Activity

Team members 6

WebWizard's activity