train.jsonl
Training data in JSON Lines format. Each line is one training example.
Fields
messages: Conversation array. Required.
rewards: Reward function names to evaluate. Required for RL.
metadata: Custom data passed to reward functions via args.metadata.
tools: Filter available tools for this task. If omitted, all tools are available.
variables: Template substitution using $variable syntax.
docker: Docker image for isolated sandbox execution. Required if using sandbox=True tools/rewards.
docker_env: Environment variables for the sandbox container (object).
Message Roles
system: System instructions (optional, appears first)
user: User message (at least one required)
assistant: Assistant response (for multi-turn context)
Examples
RL
{"messages": [{"role": "user", "content": "What is 2+2?"}], "rewards": ["accuracy"], "metadata": {"answer": "4"}}SFT
{"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"}]}With Tools
{"messages": [{"role": "user", "content": "Search for AI news"}], "rewards": ["quality"], "tools": ["search"]}With Sandbox
For entries using sandbox=True tools or rewards:
{"messages": [{"role": "user", "content": "Write a Python script that creates hello.txt"}], "rewards": ["file_created"], "tools": ["run_python"], "docker": "python:3.11-slim"}With Environment Variables
Use docker_env to pass configuration to the container:
{
"messages": [{"role": "user", "content": "Query the database"}],
"rewards": ["correct_result"],
"docker": "myorg/db-tools:latest",
"docker_env": {"DB_HOST": "localhost", "DEBUG": "true"}
}Custom Docker images must be built for linux/amd64. Sandboxes run on x86_64 Linux servers. Images built on ARM Macs without the platform flag will fail:
# Correct
docker build --platform linux/amd64 -t myorg/myimage:latest .
# Wrong (will fail)
docker build -t myorg/myimage:latest .