Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
Caco-CodeGen is a code-driven reasoning generation model trained under the Caco framework. It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale.
π Overview
Traditional Chain-of-Thought (CoT) data often lacks verifiability and diversity. Caco addresses this by grounding reasoning in executable programs, enabling automatic correctness checks and scalable reasoning synthesis.
| Property | Description |
|---|---|
| Model Type | Code LLM (Code-Aware Generator) |
| Base Model | Qwen2.5-Coder-7B |
| Training Objective | Next-token prediction on executable reasoning traces |
| Training Data | Code CoTs extracted and unified from math and algorithmic datasets |
| Output Type | Python-like executable reasoning steps (code_cot) |
| Verification | Code execution + output consistency filter |
π§ Methodology
Caco constructs reasoning data through three scalable stages:
1. Unifying Code CoT
Collect diverse seed reasoning traces (mathematical + algorithmic), normalize them into a unified executable format.
2. Scaling Code CoT
Train a Code Generator to expand reasoning traces via Pattern-level Augmentation β restructuring logic (e.g., decomposition, reformulation, alternative solution paths).
3. Instruction Reversing
Back-translate executable reasoning into natural language problems and solutions, and apply dual correctness verification.
βοΈ Usage
Example Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "LHL3341/Caco-CodeGen"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example use cases
- Fine-tuning reasoning LLMs (math, logic, or code tasks)
- Verifiable reasoning data augmentation
- Program-based RL reward modeling (RLVR)
- Cross-domain reasoning transfer experiments
π Benchmarks (Caco Models)
| Model | MATH | Olympiad | Theorem-QA |
|---|---|---|---|
| DeepSeekMath-7B-Caco | 68.2 | 29.5 | 33.8 |
| Qwen2.5-7B-Caco | 82.4 | 46.5 | 46.0 |
| Llama3-8B-Caco | 70.6 | 34.1 | 31.0 |
Models trained on Caco show consistent improvements across multiple reasoning benchmarks and domains.
π¬ Citation
If you use Caco in your research, please cite:
@article{caco,
title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning},
author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu},
journal={arXiv preprint arXiv:2510.04081},
year={2025}
}
π License
Apache 2.0 β free for academic and commercial use, with attribution.
π± Related Resources
π‘ Future Directions
- Raising Difficulty: integrate harder datasets (AM-Thinking-distill, DAPO)
- Expanding Diversity: add science, proofs, procedural planning
- RL with Verifiable Rewards (RLVR): use code execution as low-noise reward signal
- Downloads last month
- 31