Reasoning Mode
Some models can reason step-by-step inside <think>...</think> tags before answering.
<think>
3 pages × 2 friends = 6 pages per session
6 pages × 2 times per week = 12 pages per week
12 pages × 52 weeks = 624 pages per year
</think>
624Reasoning is automatically enabled for supported models. Configure it in config.yml.
Supported Models
Reasoning (Hybrid) models:
Qwen/Qwen3-8B,Qwen/Qwen3-8B-Base,Qwen/Qwen3-32B,Qwen/Qwen3-30B-A3B,Qwen/Qwen3-30B-A3B-Basedeepseek-ai/DeepSeek-V3.1,deepseek-ai/DeepSeek-V3.1-Baseopenai/gpt-oss-120b,openai/gpt-oss-20b1
Non-reasoning models:
Qwen/Qwen3-4B-Instruct-2507,Qwen/Qwen3-30B-A3B-Instruct-2507,Qwen/Qwen3-235B-A22B-Instruct-2507meta-llama/Llama-3.1-8B,meta-llama/Llama-3.1-70B,meta-llama/Llama-3.3-70B-Instructmeta-llama/Llama-3.2-1B,meta-llama/Llama-3.2-3B
OpenAI GPT-OSS
The openai/gpt-oss models are hybrid—they can operate in both reasoning and non-reasoning modes. When reasoning_mode is set to disabled, they behave like standard instruct models. With any other mode (easy, medium, hard), they produce reasoning traces inside <think> tags before answering.
Notes
- Reasoning models consume many tokens for reasoning. Consider increasing
max_tokensto 8192 or 16384. - It's common practice to instruct models to output final answers in
\boxed{}. This makes it easy to extract the final answer without accidentally matching intermediate reasoning steps.
Learn More
Create Your First Reward
Write your first reward function to train a model
Train Your First Agent
Train an agent that can use tools with RL
Footnotes
-
OpenAI GPT-OSS models support
easy,medium, andhardreasoning modes in addition todisabled. ↩