Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

Caco-CodeGen is a code-driven reasoning generation model trained under the Caco framework. It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale.

🚀 Overview

Traditional Chain-of-Thought (CoT) data often lacks verifiability and diversity. Caco addresses this by grounding reasoning in executable programs, enabling automatic correctness checks and scalable reasoning synthesis.

Property	Description
Model Type	Code LLM (Code-Aware Generator)
Base Model	Qwen2.5-Coder-7B
Training Objective	Next-token prediction on executable reasoning traces
Training Data	Code CoTs extracted and unified from math and algorithmic datasets
Output Type	Python-like executable reasoning steps (`code_cot`)
Verification	Code execution + output consistency filter

🧠 Methodology

Caco Framework Overview

Caco constructs reasoning data through three scalable stages:

1. Unifying Code CoT

Collect diverse seed reasoning traces (mathematical + algorithmic), normalize them into a unified executable format.

2. Scaling Code CoT

Train a Code Generator to expand reasoning traces via Pattern-level Augmentation — restructuring logic (e.g., decomposition, reformulation, alternative solution paths).

3. Instruction Reversing

Back-translate executable reasoning into natural language problems and solutions, and apply dual correctness verification.

⚙️ Usage

Example Inference

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "LHL3341/Caco-CodeGen"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example use cases

Fine-tuning reasoning LLMs (math, logic, or code tasks)
Verifiable reasoning data augmentation
Program-based RL reward modeling (RLVR)
Cross-domain reasoning transfer experiments

📈 Benchmarks (Caco Models)

Model	MATH	Olympiad	Theorem-QA
DeepSeekMath-7B-Caco	68.2	29.5	33.8
Qwen2.5-7B-Caco	82.4	46.5	46.0
Llama3-8B-Caco	70.6	34.1	31.0

Models trained on Caco show consistent improvements across multiple reasoning benchmarks and domains.

🔬 Citation

If you use Caco in your research, please cite:

@article{caco,
  title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning},
  author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu},
  journal={arXiv preprint arXiv:2510.04081},
  year={2025}
}

📜 License

Apache 2.0 — free for academic and commercial use, with attribution.

🌱 Related Resources

💡 Future Directions

Raising Difficulty: integrate harder datasets (AM-Thinking-distill, DAPO)
Expanding Diversity: add science, proofs, procedural planning
RL with Verifiable Rewards (RLVR): use code execution as low-noise reward signal