rewards.py
Reward functions compute the reward signal for RL training. Define them in rewards.py using the @reward decorator.
Usage
from rnow.core import reward, RewardArgs
@reward
def accuracy(args: RewardArgs, messages: list) -> float:
"""Check if response matches expected answer."""
response = messages[-1]["content"]
expected = args.metadata["answer"]
return 1.0 if expected in response else 0.0Automatic Optimization
ReinforceNow automatically analyzes your code and determines the optimal execution strategy. Just write your function - we handle the rest.
Requirements
args: RewardArgs: First parameter with metadata/variables from train.jsonl
messages: list: Second parameter with full conversation history
-> float: Return value between 0.0 and 1.0
RewardArgs
args.metadata: Dict from train.jsonl metadata field
args.variables: Dict from train.jsonl variables field
Precondition Rewards
Use @reward(precondition=True) to create gate rewards. If it returns 0, total reward is 0:
@reward(precondition=True)
def has_answer(args: RewardArgs, messages: list) -> float:
"""Gate: response must contain an answer."""
return 1.0 if "Answer:" in messages[-1]["content"] else 0.0Examples
Math Verification
from math_verify import parse, verify
from rnow.core import reward, RewardArgs
@reward
def accuracy(args: RewardArgs, messages: list) -> float:
"""Math verification."""
gold = parse(args.metadata["answer"])
pred = parse(messages[-1]["content"])
return 1.0 if verify(gold, pred) else 0.0External API
import requests
from rnow.core import reward, RewardArgs
@reward
def api_check(args: RewardArgs, messages: list) -> float:
"""Call external API."""
response = requests.post("https://api.example.com/verify",
json={"text": messages[-1]["content"]})
return response.json().get("score", 0.0)Reference in train.jsonl:
{"messages": [...], "rewards": ["accuracy"], "metadata": {"answer": "42"}}