ReinforceNowReinforceNow

Supervised Fine-tuning

SFT trains a model to mimic examples. No reward functions needed—just input-output pairs.

Quick Start

rnow init --template sft

Required Files

train.jsonl - Conversation examples:

{"messages": [{"role": "user", "content": "What is ML?"}, {"role": "assistant", "content": "Machine learning enables computers to learn from data."}]}

config.yml - Set dataset_type: sft:

dataset_type: sft

data:
  train_file: train.jsonl
  batch_size: 4

model:
  path: Qwen/Qwen3-8B

trainer:
  num_epochs: 3

Run Training

rnow run

SFT vs RL

AspectSFTRL
DataInput-output pairsPrompts + rewards
Best forFormat, styleReasoning, tools