ReinforceNowReinforceNow

Agent Harness

An agent harness is the framework that connects a language model to tools and manages the conversation flow during training.

How ReinforceNow Handles Agents

ReinforceNow uses an append-only strategy for agents. This means:

  1. Messages are only added to the conversation, never modified or removed
  2. The agent invokes tools and receives their results in a loop.
  3. The process repeats until the termination policy is met
ReinforceNow Agent Harness
ReinforceNow basic agent structure uses an append-only strategy and terminates the turn when the agent outputs the final assistant message.

Termination Policies

ReinforceNow supports two termination policies:

1) Last Tool (last_tool)

The episode ends when the agent responds without a tool call. This is the default behavior and works well for agents that should decide when they have enough information.

[user] What's the population of France and its capital? [assistant] calls search_tool("population of France") [tool] France has a population of approximately 68 million people. [assistant] calls search_tool("capital of France") [tool] The capital of France is Paris. [assistant] calls search_tool("Paris population") [tool] Paris has a population of about 2.1 million people. [assistant] France has a population of 68 million. Its capital is Paris with 2.1 million people. ↳ Episode ends (last assistant message has no tool call)

2) Max Turns (max_turns)

The episode ends after a fixed number of turns, regardless of whether the agent is still calling tools. Use this when you want to limit the agent's exploration.

# max_turns: 3 [user] Research the history of AI and summarize key milestones. [assistant] calls search_tool("history of artificial intelligence") [tool] AI history dates back to 1956... [assistant] calls search_tool("AI milestones timeline") [tool] Key milestones: 1997 Deep Blue, 2011 Watson... [assistant] calls search_tool("neural networks breakthrough") [tool] Neural networks: 2012 AlexNet, 2017 Transformers... ↳ Episode ends (max turns reached)

Notes

  • A turn is one model generation. Each assistant tool call + tool result counts as one turn.
  • Pre existing assistant messages in train.jsonl do not count toward max_turns.

Learn More