Agent Harness

An agent harness is the framework that connects a language model to tools and manages the conversation flow during training.

How ReinforceNow Handles Agents

ReinforceNow uses an append-only strategy for agents. This means:

Messages are only added to the conversation, never modified or removed
The agent invokes tools and receives their results in a loop.
The process repeats until the termination policy is met

ReinforceNow Agent Harness — ReinforceNow basic agent structure uses an append-only strategy and terminates the turn when the agent outputs the final assistant message.

Termination Policies

ReinforceNow supports two termination policies:

1) Last Tool (`last_tool`)

The episode ends when the agent responds without a tool call. This is the default behavior and works well for agents that should decide when they have enough information.

[user] What's the population of France and its capital? [assistant] calls search_tool("population of France") [tool] France has a population of approximately 68 million people. [assistant] calls search_tool("capital of France") [tool] The capital of France is Paris. [assistant] calls search_tool("Paris population") [tool] Paris has a population of about 2.1 million people. [assistant] France has a population of 68 million. Its capital is Paris with 2.1 million people. ↳ Episode ends (last assistant message has no tool call)

2) Max Turns (`max_turns`)

The episode ends after a fixed number of turns, regardless of whether the agent is still calling tools. Use this when you want to limit the agent's exploration.

# max_turns: 3 [user] Research the history of AI and summarize key milestones. [assistant] calls search_tool("history of artificial intelligence") [tool] AI history dates back to 1956... [assistant] calls search_tool("AI milestones timeline") [tool] Key milestones: 1997 Deep Blue, 2011 Watson... [assistant] calls search_tool("neural networks breakthrough") [tool] Neural networks: 2012 AlexNet, 2017 Transformers... ↳ Episode ends (max turns reached)

Notes

A turn is one model generation. Each assistant tool call + tool result counts as one turn.
Pre existing assistant messages in train.jsonl do not count toward max_turns.

Agent Harness

How ReinforceNow Handles Agents

Termination Policies

1) Last Tool (`last_tool`)

2) Max Turns (`max_turns`)

Notes

Learn More

Train Your First Agent

tools.py

On this page

Agent Harness

How ReinforceNow Handles Agents

Termination Policies

1) Last Tool (last_tool)

2) Max Turns (max_turns)

Notes

Learn More

Train Your First Agent

tools.py

On this page

1) Last Tool (`last_tool`)

2) Max Turns (`max_turns`)