Skip to main content
Token usage isn’t just about costβ€”it’s about feedback loop speed and context window limits. This guide shows you how to get more done with fewer tokens through project optimization, smart model selection, and workflow patterns.
Using Factory App? These strategies apply to both CLI and Factory App. You can view your project’s readiness score in the Agent Readiness Dashboard.

Understanding Token Usage

Tokens are consumed in three main areas:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Token Consumption                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Context      β”‚   Tool Calls   β”‚   Model Output         β”‚
β”‚   (Input)      β”‚   (Overhead)   β”‚   (Response)           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ AGENTS.md    β”‚ β€’ Read files   β”‚ β€’ Explanations         β”‚
β”‚ β€’ Memories     β”‚ β€’ Grep/search  β”‚ β€’ Generated code       β”‚
β”‚ β€’ Conversation β”‚ β€’ Execute cmds β”‚ β€’ Analysis             β”‚
β”‚ β€’ File content β”‚ β€’ Each retry   β”‚ β€’ Thinking tokens      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
High token usage often means:
  • Too much exploration (unclear instructions)
  • Multiple attempts (missing context or failing tests)
  • Verbose output (no format constraints)

Project Setup for Efficiency

The biggest token savings come from project configuration that prevents wasted cycles.

1. Fast, Reliable Tests

Slow or flaky tests are the #1 cause of wasted tokens. Each retry costs a full response cycle.
Test CharacteristicImpact on Tokens
Fast tests (< 30s)Droid verifies changes immediately
Slow tests (> 2min)Droid may skip verification or waste context waiting
Flaky testsFalse failures cause debugging cycles
No testsDroid can’t verify changes, more back-and-forth
Action items:
## In your AGENTS.md

## Testing
- Run single file: `npm test -- path/to/file.test.ts`
- Run fast smoke tests: `npm test -- --testPathPattern=smoke`
- Full suite takes ~3 minutes, use `--bail` for early exit on failure

2. Linting and Type Checking

When Droid can catch errors immediately, it fixes them in the same turn instead of waiting for you to report them.
## In your AGENTS.md

## Validation Commands
- Lint (auto-fix): `npm run lint:fix`
- Type check: `npm run typecheck`
- Full validation: `npm run validate` (lint + typecheck + test)

Always run `npm run lint:fix` after making changes.

3. Clear Project Structure

Document your file organization so Droid doesn’t waste tokens exploring:
## In your AGENTS.md

## Project Structure
- `src/components/` - React components (one per file)
- `src/hooks/` - Custom React hooks
- `src/services/` - API and business logic
- `src/types/` - TypeScript type definitions
- `tests/` - Test files mirror src/ structure

When adding a new component:
1. Create component in `src/components/ComponentName/`
2. Add index.ts for exports
3. Add ComponentName.test.tsx in same directory

Agent Readiness Checklist

The Agent Readiness Report evaluates your project against criteria that directly impact token efficiency.

High-Impact Criteria

CriterionToken ImpactWhy It Matters
Linter Configuration🟒 HighCatches errors immediately, no debugging cycles
Type Checker🟒 HighPrevents runtime errors, clearer code
Unit Tests Runnable🟒 HighVerification in same turn
AGENTS.md🟒 HighContext upfront, less exploration
Build Command Documentation🟑 MediumNo guessing, fewer failed attempts
Dependencies Pinned🟑 MediumReproducible builds
Pre-commit Hooks🟑 MediumAutomatic quality enforcement
Run the readiness report to identify gaps:
droid
> /readiness-report

Model Selection Strategy

Different models have different cost multipliers and capabilities. Match the model to the task:

Cost Multipliers

ModelMultiplierBest For
GLM-4.6 (Droid Core)0.25Γ—Bulk automation, simple tasks
Claude Haiku 4.50.4Γ—Quick edits, routine work
GPT-5.1 / GPT-5.1-Codex0.5Γ—Implementation, debugging
Gemini 3 Pro0.8Γ—Research, analysis
Claude Sonnet 4.51.2Γ—Balanced quality/cost
Claude Opus 4.52Γ—Complex reasoning, architecture
Claude Opus 4.16Γ—Maximum capability (use sparingly)

Task-Based Model Selection

Simple edit, formatting      β†’ Haiku 4.5 (0.4Γ—)
Implement feature from spec  β†’ GPT-5.1-Codex (0.5Γ—)
Debug complex issue          β†’ Sonnet 4.5 (1.2Γ—)
Architecture planning        β†’ Opus 4.5 (2Γ—)
Bulk file processing         β†’ Droid Core (0.25Γ—)

Reasoning Effort Impact

Higher reasoning = more β€œthinking” tokens but often fewer retries.
ReasoningWhen to UseToken Trade-off
Off/NoneSimple, clear tasksLowest per-turn, may need more turns
LowStandard implementationGood balance
MediumComplex logic, debuggingHigher per-turn, fewer retries
HighArchitecture, analysisHighest per-turn, best first-attempt
Rule of thumb: Use higher reasoning for tasks where a wrong first attempt would be expensive to fix.
Configure mixed models to automatically use different models for planning vs implementation. See Mixed Models for setup.

Workflow Patterns for Efficiency

Pattern 1: Spec Mode for Complex Work

Use Specification Mode (Shift+Tab or /spec) to plan before implementing. Without Spec Mode:
Turn 1: Start implementing β†’ wrong approach β†’ wasted tokens
Turn 2: Undo and try different approach β†’ more tokens
Turn 3: Finally get it right
Total: 3 turns of implementation tokens
With Spec Mode:
Turn 1: Plan with exploration β†’ correct approach identified
Turn 2: Implement correctly
Total: 1 turn of planning + 1 turn of implementation
Use Spec Mode (Shift+Tab or /spec) for any task that:
  • Touches more than 2 files
  • Requires understanding existing patterns
  • Has unclear requirements
  • Is security-sensitive

Pattern 2: IDE Plugin for Context

Without IDE plugin, Droid must read files to understand context:
Read file A β†’ Read file B β†’ Read file C β†’ Understand context β†’ Work
(4 tool calls before actual work)
With IDE plugin, context is immediate:
Work (IDE provides open files, errors, selection)
(0 extra tool calls for context)

Pattern 3: Specific Over General

Expensive prompt:
"Fix the bug in the auth module"
β†’ Droid reads multiple files to find the bug, explores different possibilities Efficient prompt:
"Fix the timeout bug in src/auth/session.ts line 45 where the session expires after 5 minutes instead of 24 hours"
β†’ Droid goes directly to the issue

Pattern 4: Batch Similar Work

Expensive:
Turn 1: "Add logging to userService"
Turn 2: "Add logging to orderService"
Turn 3: "Add logging to paymentService"
(3 turns, context rebuilt each time)
Efficient:
Turn 1: "Add structured logging to all services in src/services/. Use the pattern from src/lib/logger.ts. Services: user, order, payment."
(1 turn, pattern established once)

Reducing Token Waste

Common Waste Patterns

PatternCauseFix
Multiple exploration cyclesUnclear requirementsBe specific upfront
Repeated file readsMissing IDE contextInstall IDE plugin
Failed attemptsNo tests/lintingAdd validation tools
Verbose explanationsNo format constraintAsk for concise output
Wrong architectureMissing contextUse Spec Mode

Format Constraints

Ask for specific output formats to reduce verbosity:
"Add the feature. Return only the changed code, no explanations unless something is unclear."
"Review this code. Format: bullet list of issues only, no preamble."
"Debug this test failure. Show me the fix, then explain in 2-3 sentences."

Monitoring Your Usage

Check Current Session Cost

droid
> /cost
This shows token usage for the current session.

Track Over Time

Review your usage patterns:
  1. After each session, note the /cost output
  2. Identify expensive sessions: What made them expensive?
  3. Refine approach: More context? Different model? Better prompts?

Usage Red Flags

Watch for these patterns:
  • 🚩 High read count: Droid is exploring too much (add AGENTS.md context)
  • 🚩 Multiple grep/search calls: Unclear what to look for (be more specific)
  • 🚩 Repeated similar edits: Failed attempts (check tests/linting)
  • 🚩 Very long conversations: Scope creep (break into smaller tasks)

Quick Wins Checklist

Implement these for immediate token savings:
  • Install IDE plugin - Eliminates context-gathering tool calls
  • Create AGENTS.md - Droid knows build/test commands upfront
  • Configure linting - Errors caught immediately
  • Fast test command - Verification in same turn
  • Use Spec Mode - Prevents expensive false starts
  • Be specific - Reduces exploration cycles
  • Match model to task - Don’t use Opus for simple edits

Token Budget Guidelines

Rough guidelines for common tasks:
Task TypeTypical Token RangeNotes
Quick edit5k-15kSimple, specific changes
Feature implementation30k-80kWith Spec Mode planning
Complex debugging50k-150kMay need multiple attempts
Architecture planning20k-50kHigh-reasoning model
Code review30k-60kDepends on PR size
Bulk refactoring50k-200kMany files, use efficient model
If you’re significantly exceeding these ranges, review the waste patterns above.

Summary: The Token-Efficient Workflow

1. Set up your project
   └─ AGENTS.md with commands
   └─ Fast tests
   └─ Linting configured
   └─ IDE plugin installed

2. Start each task right
   └─ Use Spec Mode for complex work
   └─ Be specific about the goal
   └─ Reference existing patterns

3. Choose the right model
   └─ Simple β†’ Haiku/Droid Core
   └─ Standard β†’ Codex/Sonnet
   └─ Complex β†’ Opus (with reasoning)

4. Monitor and adjust
   └─ Check /cost periodically
   └─ Identify expensive patterns
   └─ Refine your approach

Next Steps