メインコンテンツへスキップ
Model quality evolves quickly, and we tune the CLI defaults as the ecosystem shifts. Use this guide as a snapshot of how the major options compare today, and expect to revisit it as we publish updates. This guide was last updated on Wednesday, February 25th 2026.

1 · Current stack rank (February 2026)

RankModelWhy we reach for it
1Claude Opus 4.6Latest Anthropic flagship with Max reasoning; best depth and safety for complex work.
2Claude Opus 4.6 FastOpus 4.6 tuned for faster response times; 12× multiplier.
3Claude Opus 4.5Proven quality-and-safety balance; strong default for TUI and exec.
4GPT-5.1-Codex-MaxFast coding loops with support up to Extra High reasoning; great for heavy implementation and debugging.
5Claude Sonnet 4.5Strong daily driver with balanced cost/quality; great general-purpose choice when you don’t need Opus-level depth.
6GPT-5.3-CodexNewest OpenAI coding model with Extra High reasoning and verbosity support; strong for implementation-heavy tasks.
7GPT-5.2-CodexProven OpenAI coding model with Extra High reasoning; solid for implementation-heavy tasks.
8GPT-5.1-CodexQuick iteration with solid code quality at lower cost; bump reasoning when you need more depth.
9GPT-5.1Good generalist, especially when you want OpenAI ergonomics with flexible reasoning effort.
10GPT-5.2Advanced OpenAI model with verbosity support and reasoning up to Extra High.
11Claude Haiku 4.5Fast, cost-efficient for routine tasks and high-volume automation.
12Gemini 3.1 ProNewer Gemini Pro generation with strong structured outputs and mixed reasoning controls for research-heavy tasks.
13Gemini 3 ProStrong at mixed reasoning with Low/High settings; helpful for researchy flows with structured outputs.
14Gemini 3 FlashFast, cheap (0.2× multiplier) with full reasoning support; great for high-volume tasks where speed matters.
15Droid Core (MiniMax M2.5)Open-source, 0.12× multiplier with reasoning support (Low/Medium/High); cheapest model available. No image support.
16Droid Core (GLM-5)Open-source, 0.4× multiplier with updated GLM capabilities for bulk automation and air-gapped environments; no image support.
17Droid Core (GLM-4.7)Open-source, 0.25× multiplier, stable choice for bulk automation or air-gapped environments; note: no image support.
18Droid Core (Kimi K2.5)Open-source, 0.25× multiplier with image support; good for cost-sensitive work.
We ship model updates regularly. When a new release overtakes the list above, we update this page and the CLI defaults.

2 · Match the model to the job

ScenarioRecommended model
Deep planning, architecture reviews, ambiguous product specsStart with Opus 4.6 for best depth and safety, or Opus 4.6 Fast for faster turnaround. Use Sonnet 4.5 when you want balanced cost/quality, or Codex/Codex-Max for faster iteration with reasoning.
Full-feature development, large refactorsOpus 4.6 or Opus 4.5 for depth and safety. GPT-5.3-Codex, GPT-5.2-Codex, or GPT-5.1-Codex-Max when you need speed plus Extra High reasoning; Sonnet 4.5 for balanced loops.
Repeatable edits, summarization, boilerplate generationHaiku 4.5 or Droid Core (including MiniMax M2.5 at 0.12×) for speed and cost. GPT-5.1 / GPT-5.1-Codex when you need higher quality or structured outputs.
CI/CD or automation loopsFavor Haiku 4.5 or Droid Core for predictable, low-cost throughput. Use Codex or Codex-Max when automation needs stronger reasoning.
High-volume automation, frequent quick turnsHaiku 4.5 for speedy feedback. Droid Core (especially MiniMax M2.5 at 0.12× with reasoning) when cost is critical or you need air-gapped deployment.
Claude Opus 4.6 is the top-tier option for extremely complex architecture decisions or critical work where you need maximum reasoning capability. Opus 4.6 Fast is tuned for faster responses at a higher cost. Most tasks don’t require Opus-level power—start with Sonnet 4.5 and escalate only if needed.
Tip: you can swap models mid-session with /model or by toggling in the settings panel (Shift+TabSettings).

3 · Switching models mid-session

  • Use /model (or Shift+Tab → Settings → Model) to swap without losing your chat history.
  • If you change providers (e.g. Anthropic to OpenAI), the CLI converts the session transcript between Anthropic and OpenAI formats. The translation is lossy—provider-specific metadata is dropped—but we have not seen accuracy regressions in practice.
  • For the best context continuity, switch models at natural milestones: after a commit, once a PR lands, or when you abandon a failed approach and reset the plan.
  • If you flip back and forth rapidly, expect the assistant to spend a turn re-grounding itself; consider summarizing recent progress when you switch.

4 · Reasoning effort settings

  • Opus 4.6 / Opus 4.6 Fast: Off / Low / Medium / High / Max (default: High)
  • Opus 4.5 / Sonnet 4.5 / Haiku 4.5: Off / Low / Medium / High (default: Off)
  • GPT-5.1: None / Low / Medium / High (default: None)
  • GPT-5.1-Codex: Low / Medium / High (default: Medium)
  • GPT-5.1-Codex-Max: Low / Medium / High / Extra High (default: Medium)
  • GPT-5.2: Off / Low / Medium / High / Extra High (default: Low)
  • GPT-5.2-Codex: None / Low / Medium / High / Extra High (default: Medium)
  • GPT-5.3-Codex: None / Low / Medium / High / Extra High (default: Medium)
  • Gemini 3.1 Pro: Low / Medium / High (default: High)
  • Gemini 3 Pro: None / Low / Medium / High (default: High)
  • Gemini 3 Flash: Minimal / Low / Medium / High (default: High)
  • Droid Core (GLM-5): None only (default: None; no image support)
  • Droid Core (GLM-4.7): None only (default: None; no image support)
  • Droid Core (Kimi K2.5): None only (default: None)
  • Droid Core (MiniMax M2.5): Low / Medium / High (default: High)
Reasoning effort increases latency and cost—start low for simple work and escalate as needed. Max is available on Claude Opus 4.6. Extra High is available on GPT-5.1-Codex-Max, GPT-5.2, GPT-5.2-Codex, and GPT-5.3-Codex.
Change reasoning effort from /modelReasoning effort, or via the settings menu.

5 · Bring Your Own Keys (BYOK)

Factory ships with managed Anthropic and OpenAI access. If you prefer to run against your own accounts, BYOK is opt-in—see Bring Your Own Keys for setup steps, supported providers, and billing notes.

Open-source models

Droid Core (GLM-5), Droid Core (GLM-4.7), Droid Core (Kimi K2.5), and Droid Core (MiniMax M2.5) are open-source alternatives available in the CLI. They’re useful for:
  • Air-gapped environments where external API calls aren’t allowed
  • Cost-sensitive projects needing unlimited local inference
  • Privacy requirements where code cannot leave your infrastructure
  • Experimentation with open-source model capabilities
Note: GLM-5, GLM-4.7, and MiniMax M2.5 do not support image attachments. Kimi K2.5 does support images. MiniMax M2.5 is the cheapest model available (0.12× multiplier) and uniquely supports reasoning (Low/Medium/High) among Droid Core models. For image-based workflows, use Claude, GPT, or Kimi models. To use open-source models, you’ll need to configure them via BYOK with a local inference server (like Ollama) or a hosted provider. See BYOK documentation for setup instructions.

6 · Keep notes on what works

  • Track high-impact workflows (e.g., spec generation vs. quick edits) and which combinations of model + reasoning effort feel best.
  • Ping the community or your Factory contact when you notice a model regression so we can benchmark and update this guidance quickly.