Reasoning Mode

Some models can reason step-by-step inside <think>...</think> tags before answering.

<think>
3 pages × 2 friends = 6 pages per session
6 pages × 2 times per week = 12 pages per week
12 pages × 52 weeks = 624 pages per year
</think>

624

Reasoning is automatically enabled for supported models. Configure it in config.yml.

Supported Models

Reasoning (Hybrid) models:

Qwen/Qwen3-8B, Qwen/Qwen3-8B-Base, Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B, Qwen/Qwen3-30B-A3B-Base
deepseek-ai/DeepSeek-V3.1, deepseek-ai/DeepSeek-V3.1-Base
openai/gpt-oss-120b, openai/gpt-oss-20b ¹

Non-reasoning models:

Qwen/Qwen3-4B-Instruct-2507, Qwen/Qwen3-30B-A3B-Instruct-2507, Qwen/Qwen3-235B-A22B-Instruct-2507
meta-llama/Llama-3.1-8B, meta-llama/Llama-3.1-70B, meta-llama/Llama-3.3-70B-Instruct
meta-llama/Llama-3.2-1B, meta-llama/Llama-3.2-3B

The openai/gpt-oss models are hybrid—they can operate in both reasoning and non-reasoning modes. When reasoning_mode is set to disabled, they behave like standard instruct models. With any other mode (easy, medium, hard), they produce reasoning traces inside <think> tags before answering.

Notes

Reasoning models consume many tokens for reasoning. Consider increasing max_context_window to 8192 or 16384.
It's common practice to instruct models to output final answers in \boxed{}. This makes it easy to extract the final answer without accidentally matching intermediate reasoning steps.

Learn More

Create Your First Reward

Write your first reward function to train a model

Train Your First Agent

Train an agent that can use tools with RL

OpenAI GPT-OSS models support easy, medium, and hard reasoning modes in addition to disabled. ↩

Reasoning Mode

Supported Models

OpenAI GPT-OSS

Notes

Learn More

Create Your First Reward

Train Your First Agent

Footnotes

On this page