--- license: apache-2.0 base_model: - Qwen/Qwen2.5-Coder-7B tags: - code --- # Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning [![Paper](https://img.shields.io/badge/Paper-arXiv:2510.04081-B31B1B)](https://arxiv.org/abs/2510.04081) [![Conference](https://img.shields.io/badge/NeurIPS-2025-1E90FF)](https://neurips.cc/) [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0) **Caco-CodeGen** is a code-driven reasoning generation model trained under the Caco framework. It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale. --- ## 🚀 Overview Traditional Chain-of-Thought (CoT) data often lacks **verifiability** and **diversity**. **Caco** addresses this by grounding reasoning in *executable programs*, enabling automatic correctness checks and scalable reasoning synthesis. | Property | Description | | ---------------------- | -------------------------------------------------------------------------- | | **Model Type** | Code LLM (Code-Aware Generator) | | **Base Model** | Qwen2.5-Coder-7B | | **Training Objective** | Next-token prediction on executable reasoning traces | | **Training Data** | Code CoTs extracted and unified from math and algorithmic datasets | | **Output Type** | Python-like executable reasoning steps (`code_cot`) | | **Verification** | Code execution + output consistency filter | --- ## 🧠 Methodology

Caco Framework Overview

Caco constructs reasoning data through **three scalable stages**: ### 1. Unifying Code CoT Collect diverse **seed reasoning traces** (mathematical + algorithmic), normalize them into a unified executable format. ### 2. Scaling Code CoT Train a **Code Generator** to expand reasoning traces via **Pattern-level Augmentation** — restructuring logic (e.g., decomposition, reformulation, alternative solution paths). ### 3. Instruction Reversing Back-translate executable reasoning into **natural language problems and solutions**, and apply **dual correctness verification**. --- ## ⚙️ Usage ### Example Inference ```bash from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "LHL3341/Caco-CodeGen" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda") prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=1024) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Example use cases * Fine-tuning reasoning LLMs (math, logic, or code tasks) * Verifiable reasoning data augmentation * Program-based RL reward modeling (RLVR) * Cross-domain reasoning transfer experiments --- ## 📈 Benchmarks (Caco Models) | Model | MATH | Olympiad | Theorem-QA | | -------------------- | -------- | -------- | ---------- | | DeepSeekMath-7B-Caco | 68.2 | 29.5 | 33.8 | | Qwen2.5-7B-Caco | **82.4** | **46.5** | **46.0** | | Llama3-8B-Caco | 70.6 | 34.1 | 31.0 | Models trained on Caco show **consistent improvements** across multiple reasoning benchmarks and domains. --- ## 🔬 Citation If you use **Caco** in your research, please cite: ```bibtex @article{caco, title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning}, author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu}, journal={arXiv preprint arXiv:2510.04081}, year={2025} } ``` --- ## 📜 License Apache 2.0 — free for academic and commercial use, with attribution. --- ## 🌱 Related Resources * [🧠 Caco Paper (arXiv:2510.04081)](https://arxiv.org/abs/2510.04081) * [🧩 Caco-1.3M Dataset](https://huggingface.co/datasets/LHL3341/Caco-1.3M) --- ## 💡 Future Directions * **Raising Difficulty:** integrate harder datasets (AM-Thinking-distill, DAPO) * **Expanding Diversity:** add science, proofs, procedural planning * **RL with Verifiable Rewards (RLVR):** use code execution as low-noise reward signal