ReinforceNowReinforceNow

rewards.py

Reward functions compute the reward signal for RL training. Define them in rewards.py using the @reward decorator.

Usage

from rnow.core import reward, RewardArgs

@reward
def accuracy(args: RewardArgs, messages: list) -> float:
    """Check if response matches expected answer."""
    response = messages[-1]["content"]
    expected = args.metadata["answer"]
    return 1.0 if expected in response else 0.0

Automatic Optimization

ReinforceNow automatically analyzes your code and determines the optimal execution strategy. Just write your function - we handle the rest.

Requirements

args: RewardArgs: First parameter with metadata/variables from train.jsonl

messages: list: Second parameter with full conversation history

-> float: Return value between 0.0 and 1.0

RewardArgs

args.metadata: Dict from train.jsonl metadata field

args.variables: Dict from train.jsonl variables field

Precondition Rewards

Use @reward(precondition=True) to create gate rewards. If it returns 0, total reward is 0:

@reward(precondition=True)
def has_answer(args: RewardArgs, messages: list) -> float:
    """Gate: response must contain an answer."""
    return 1.0 if "Answer:" in messages[-1]["content"] else 0.0

Examples

Math Verification

from math_verify import parse, verify
from rnow.core import reward, RewardArgs

@reward
def accuracy(args: RewardArgs, messages: list) -> float:
    """Math verification."""
    gold = parse(args.metadata["answer"])
    pred = parse(messages[-1]["content"])
    return 1.0 if verify(gold, pred) else 0.0

External API

import requests
from rnow.core import reward, RewardArgs

@reward
def api_check(args: RewardArgs, messages: list) -> float:
    """Call external API."""
    response = requests.post("https://api.example.com/verify",
        json={"text": messages[-1]["content"]})
    return response.json().get("score", 0.0)

Reference in train.jsonl:

{"messages": [...], "rewards": ["accuracy"], "metadata": {"answer": "42"}}