ReinforceNowReinforceNow

train.jsonl

Training data in JSON Lines format. Each line is one training example.

Fields

messages: Conversation array. Required.

rewards: Reward function names to evaluate. Required for RL.

metadata: Custom data passed to reward functions via args.metadata.

tools: Filter available tools for this task. If omitted, all tools are available.

variables: Template substitution using $variable syntax.

Message Roles

system: System instructions (optional, appears first)

user: User message (at least one required)

assistant: Assistant response (for multi-turn context)

Examples

RL

{"messages": [{"role": "user", "content": "What is 2+2?"}], "rewards": ["accuracy"], "metadata": {"answer": "4"}}

SFT

{"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"}]}

With Tools

{"messages": [{"role": "user", "content": "Search for AI news"}], "rewards": ["quality"], "tools": ["search"]}