Supervised Finetuning

SFT trains a model to mimic examples. No reward functions needed—just input-output pairs.

Quick Start

rnow init --template sft

Required Files

train.jsonl - Conversation examples:

{"messages": [{"role": "user", "content": "What is ML?"}, {"role": "assistant", "content": "Machine learning enables computers to learn from data."}]}

config.yml - Set dataset_type: sft:

dataset_type: sft

data:
  train_file: train.jsonl
  batch_size: 4

model:
  path: Qwen/Qwen3-8B

trainer:
  num_epochs: 3

Run Training

rnow run

SFT vs RL

Aspect	SFT	RL
Data	Input-output pairs	Prompts + rewards
Best for	Format, style	Reasoning, tools

Supervised Finetuning

Quick Start

Required Files

Run Training

SFT vs RL

On this page