train.jsonl
Training data in JSON Lines format. Each line is one training example.
Fields
messages: Conversation array. Required.
rewards: Reward function names to evaluate. Required for RL.
metadata: Custom data passed to reward functions via args.metadata.
tools: Filter available tools for this task. If omitted, all tools are available.
variables: Template substitution using $variable syntax.
Message Roles
system: System instructions (optional, appears first)
user: User message (at least one required)
assistant: Assistant response (for multi-turn context)
Examples
RL
{"messages": [{"role": "user", "content": "What is 2+2?"}], "rewards": ["accuracy"], "metadata": {"answer": "4"}}SFT
{"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"}]}With Tools
{"messages": [{"role": "user", "content": "Search for AI news"}], "rewards": ["quality"], "tools": ["search"]}