ReinforceNowReinforceNow

Quickstart

We will finetune a Qwen/Qwen3-8B using supervised finetuning (SFT) on example conversations. SFT trains the model to mimic high-quality responses from your dataset.

After completing the installation, paste this prompt into Claude Code:

Claude Code Prompt
Finetune Qwen3-8B with RL on this math reasoning dataset:
https://huggingface.co/datasets/ReinforceNow/rl-single-math-reasoning

Option B: Manual Setup

Step 1: Create a new folder and fetch the template:

rnow init --template sft

Step 2: Start the training run:

rnow run

Your terminal should look similar to this:

Terminal Run

Step 3: View metrics and traces:

Navigate to your experiment link to view your model's performance. You can share your traces by clicking the share button, like this one: https://www.reinforcenow.ai/shared/runs/cmixqdocj000004l8cbkg2bkp

Quickstart Graphs
Quickstart Traces

You can share your experiment traces by clicking the share button.

Next steps