Supervised Fine-tuning
SFT trains a model to mimic examples. No reward functions needed—just input-output pairs.
Quick Start
rnow init --template sftRequired Files
train.jsonl - Conversation examples:
{"messages": [{"role": "user", "content": "What is ML?"}, {"role": "assistant", "content": "Machine learning enables computers to learn from data."}]}config.yml - Set dataset_type: sft:
dataset_type: sft
data:
train_file: train.jsonl
batch_size: 4
model:
path: Qwen/Qwen3-8B
trainer:
num_epochs: 3Run Training
rnow runSFT vs RL
| Aspect | SFT | RL |
|---|---|---|
| Data | Input-output pairs | Prompts + rewards |
| Best for | Format, style | Reasoning, tools |