Agent Harness
An agent harness is the framework that connects a language model to tools and manages the conversation flow during training.
How ReinforceNow Handles Agents
ReinforceNow uses an append-only strategy for agents. This means:
- Messages are only added to the conversation, never modified or removed
- The agent invokes tools and receives their results in a loop.
- The process repeats until the termination policy is met

Termination Policies
ReinforceNow supports two termination policies:
1) Last Tool (last_tool)
The episode ends when the agent responds without a tool call. This is the default behavior and works well for agents that should decide when they have enough information.
[user] What's the population of France and its capital? [assistant] calls search_tool("population of France") [tool] France has a population of approximately 68 million people. [assistant] calls search_tool("capital of France") [tool] The capital of France is Paris. [assistant] calls search_tool("Paris population") [tool] Paris has a population of about 2.1 million people. [assistant] France has a population of 68 million. Its capital is Paris with 2.1 million people. ↳ Episode ends (last assistant message has no tool call)2) Max Turns (max_turns)
The episode ends after a fixed number of turns, regardless of whether the agent is still calling tools. Use this when you want to limit the agent's exploration.
# max_turns: 3 [user] Research the history of AI and summarize key milestones. [assistant] calls search_tool("history of artificial intelligence") [tool] AI history dates back to 1956... [assistant] calls search_tool("AI milestones timeline") [tool] Key milestones: 1997 Deep Blue, 2011 Watson... [assistant] calls search_tool("neural networks breakthrough") [tool] Neural networks: 2012 AlexNet, 2017 Transformers... ↳ Episode ends (max turns reached)Notes
- A turn is one model generation. Each assistant tool call + tool result counts as one turn.
- Pre existing assistant messages in train.jsonl do not count toward max_turns.