Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

Paper Conference License

Caco-CodeGen is a code-driven reasoning generation model trained under the Caco framework. It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale.


πŸš€ Overview

Traditional Chain-of-Thought (CoT) data often lacks verifiability and diversity. Caco addresses this by grounding reasoning in executable programs, enabling automatic correctness checks and scalable reasoning synthesis.

Property Description
Model Type Code LLM (Code-Aware Generator)
Base Model Qwen2.5-Coder-7B
Training Objective Next-token prediction on executable reasoning traces
Training Data Code CoTs extracted and unified from math and algorithmic datasets
Output Type Python-like executable reasoning steps (code_cot)
Verification Code execution + output consistency filter

🧠 Methodology

Caco Framework Overview

Caco constructs reasoning data through three scalable stages:

1. Unifying Code CoT

Collect diverse seed reasoning traces (mathematical + algorithmic), normalize them into a unified executable format.

2. Scaling Code CoT

Train a Code Generator to expand reasoning traces via Pattern-level Augmentation β€” restructuring logic (e.g., decomposition, reformulation, alternative solution paths).

3. Instruction Reversing

Back-translate executable reasoning into natural language problems and solutions, and apply dual correctness verification.


βš™οΈ Usage

Example Inference

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "LHL3341/Caco-CodeGen"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example use cases

  • Fine-tuning reasoning LLMs (math, logic, or code tasks)
  • Verifiable reasoning data augmentation
  • Program-based RL reward modeling (RLVR)
  • Cross-domain reasoning transfer experiments

πŸ“ˆ Benchmarks (Caco Models)

Model MATH Olympiad Theorem-QA
DeepSeekMath-7B-Caco 68.2 29.5 33.8
Qwen2.5-7B-Caco 82.4 46.5 46.0
Llama3-8B-Caco 70.6 34.1 31.0

Models trained on Caco show consistent improvements across multiple reasoning benchmarks and domains.


πŸ”¬ Citation

If you use Caco in your research, please cite:

@article{caco,
  title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning},
  author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu},
  journal={arXiv preprint arXiv:2510.04081},
  year={2025}
}

πŸ“œ License

Apache 2.0 β€” free for academic and commercial use, with attribution.


🌱 Related Resources


πŸ’‘ Future Directions

  • Raising Difficulty: integrate harder datasets (AM-Thinking-distill, DAPO)
  • Expanding Diversity: add science, proofs, procedural planning
  • RL with Verifiable Rewards (RLVR): use code execution as low-noise reward signal
Downloads last month
31
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for LHL3341/Caco-CodeGen

Base model

Qwen/Qwen2.5-7B
Finetuned
(65)
this model
Quantizations
2 models

Collection including LHL3341/Caco-CodeGen