Spaces:
Paused
Paused
| - sections: | |
| - local: index | |
| title: TRL | |
| - local: installation | |
| title: Installation | |
| - local: quickstart | |
| title: Quickstart | |
| title: Getting started | |
| - sections: | |
| - local: dataset_formats | |
| title: Dataset Formats | |
| - local: how_to_train | |
| title: Training FAQ | |
| - local: logging | |
| title: Understanding Logs | |
| title: Conceptual Guides | |
| - sections: | |
| - local: clis | |
| title: Command Line Interface (CLI) | |
| - local: customization | |
| title: Customizing the Training | |
| - local: reducing_memory_usage | |
| title: Reducing Memory Usage | |
| - local: speeding_up_training | |
| title: Speeding Up Training | |
| - local: distributing_training | |
| title: Distributing Training | |
| - local: use_model | |
| title: Using Trained Models | |
| title: How-to guides | |
| - sections: | |
| - local: deepspeed_integration | |
| title: DeepSpeed | |
| - local: liger_kernel_integration | |
| title: Liger Kernel | |
| - local: peft_integration | |
| title: PEFT | |
| - local: unsloth_integration | |
| title: Unsloth | |
| - local: vllm_integration | |
| title: vLLM | |
| title: Integrations | |
| - sections: | |
| - local: example_overview | |
| title: Example Overview | |
| - local: community_tutorials | |
| title: Community Tutorials | |
| - local: sentiment_tuning | |
| title: Sentiment Tuning | |
| - local: using_llama_models | |
| title: Training StackLlama | |
| - local: detoxifying_a_lm | |
| title: Detoxifying a Language Model | |
| - local: multi_adapter_rl | |
| title: Multi Adapter RLHF | |
| - local: training_vlm_sft | |
| title: Fine-tuning a Multimodal Model Using SFT (Single or Multi-Image Dataset) | |
| title: Examples | |
| - sections: | |
| - sections: # Sorted alphabetically | |
| - local: alignprop_trainer | |
| title: AlignProp | |
| - local: bco_trainer | |
| title: BCO | |
| - local: cpo_trainer | |
| title: CPO | |
| - local: ddpo_trainer | |
| title: DDPO | |
| - local: dpo_trainer | |
| title: DPO | |
| - local: online_dpo_trainer | |
| title: Online DPO | |
| - local: gkd_trainer | |
| title: GKD | |
| - local: grpo_trainer | |
| title: GRPO | |
| - local: kto_trainer | |
| title: KTO | |
| - local: nash_md_trainer | |
| title: Nash-MD | |
| - local: orpo_trainer | |
| title: ORPO | |
| - local: ppo_trainer | |
| title: PPO | |
| - local: prm_trainer | |
| title: PRM | |
| - local: reward_trainer | |
| title: Reward | |
| - local: rloo_trainer | |
| title: RLOO | |
| - local: sft_trainer | |
| title: SFT | |
| - local: iterative_sft_trainer | |
| title: Iterative SFT | |
| - local: xpo_trainer | |
| title: XPO | |
| title: Trainers | |
| - local: models | |
| title: Model Classes | |
| - local: model_utils | |
| title: Model Utilities | |
| - local: best_of_n | |
| title: Best of N Sampling | |
| - local: judges | |
| title: Judges | |
| - local: callbacks | |
| title: Callbacks | |
| - local: data_utils | |
| title: Data Utilities | |
| - local: rewards | |
| title: Reward Functions | |
| - local: script_utils | |
| title: Script Utilities | |
| - local: others | |
| title: Others | |
| title: API | |