ReinforceNowReinforceNow

Reasoning Mode

Some models can reason step-by-step inside <think>...</think> tags before answering.

<think>
3 pages × 2 friends = 6 pages per session
6 pages × 2 times per week = 12 pages per week
12 pages × 52 weeks = 624 pages per year
</think>

624

Reasoning is automatically enabled for supported models. Configure it in config.yml.

Supported Models

Reasoning (Hybrid) models:

  • Qwen/Qwen3-8B, Qwen/Qwen3-8B-Base, Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B, Qwen/Qwen3-30B-A3B-Base
  • deepseek-ai/DeepSeek-V3.1, deepseek-ai/DeepSeek-V3.1-Base
  • openai/gpt-oss-120b, openai/gpt-oss-20b 1

Non-reasoning models:

  • Qwen/Qwen3-4B-Instruct-2507, Qwen/Qwen3-30B-A3B-Instruct-2507, Qwen/Qwen3-235B-A22B-Instruct-2507
  • meta-llama/Llama-3.1-8B, meta-llama/Llama-3.1-70B, meta-llama/Llama-3.3-70B-Instruct
  • meta-llama/Llama-3.2-1B, meta-llama/Llama-3.2-3B

OpenAI GPT-OSS

The openai/gpt-oss models are hybrid—they can operate in both reasoning and non-reasoning modes. When reasoning_mode is set to disabled, they behave like standard instruct models. With any other mode (easy, medium, hard), they produce reasoning traces inside <think> tags before answering.

Notes

  • Reasoning models consume many tokens for reasoning. Consider increasing max_tokens to 8192 or 16384.
  • It's common practice to instruct models to output final answers in \boxed{}. This makes it easy to extract the final answer without accidentally matching intermediate reasoning steps.

Learn More

Footnotes

  1. OpenAI GPT-OSS models support easy, medium, and hard reasoning modes in addition to disabled.