Quickstart

We will finetune a Qwen/Qwen3-8B using supervised finetuning (SFT) on example conversations. SFT trains the model to mimic high-quality responses from your dataset.

Option A: Claude Code (Recommended)

After completing the installation, paste this prompt into Claude Code:

Claude Code Prompt

Finetune Qwen3-8B with RL on this math reasoning dataset:
https://huggingface.co/datasets/ReinforceNow/rl-single-math-reasoning

Option B: Manual Setup

Step 1: Create a new folder and fetch the template:

rnow init --template sft

Step 2: Start the training run:

rnow run

Your terminal should look similar to this:

Step 3: View metrics and traces:

Navigate to your experiment link to view your model's performance. You can share your traces by clicking the share button, like this one: https://www.reinforcenow.ai/shared/runs/cmixqdocj000004l8cbkg2bkp

You can share your experiment traces by clicking the share button.

Next steps

Create Your First Reward

Write your first reward function to train a model

Train Your First Agent

Train an agent that can use tools with RL

CLI Reference

Complete reference for all CLI commands