Token usage isnβt just about costβitβs about feedback loop speed and context window limits. This guide shows you how to get more done with fewer tokens through project optimization, smart model selection, and workflow patterns.
Understanding Token Usage
Tokens are consumed in three main areas:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Token Consumption β
ββββββββββββββββββ¬βββββββββββββββββ¬βββββββββββββββββββββββββ€
β Context β Tool Calls β Model Output β
β (Input) β (Overhead) β (Response) β
ββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββββββββ€
β β’ AGENTS.md β β’ Read files β β’ Explanations β
β β’ Memories β β’ Grep/search β β’ Generated code β
β β’ Conversation β β’ Execute cmds β β’ Analysis β
β β’ File content β β’ Each retry β β’ Thinking tokens β
ββββββββββββββββββ΄βββββββββββββββββ΄βββββββββββββββββββββββββ
High token usage often means:
- Too much exploration (unclear instructions)
- Multiple attempts (missing context or failing tests)
- Verbose output (no format constraints)
Project Setup for Efficiency
The biggest token savings come from project configuration that prevents wasted cycles.
1. Fast, Reliable Tests
Slow or flaky tests are the #1 cause of wasted tokens. Each retry costs a full response cycle.
| Test Characteristic | Impact on Tokens |
|---|
| Fast tests (< 30s) | Droid verifies changes immediately |
| Slow tests (> 2min) | Droid may skip verification or waste context waiting |
| Flaky tests | False failures cause debugging cycles |
| No tests | Droid canβt verify changes, more back-and-forth |
Action items:
## In your AGENTS.md
## Testing
- Run single file: `npm test -- path/to/file.test.ts`
- Run fast smoke tests: `npm test -- --testPathPattern=smoke`
- Full suite takes ~3 minutes, use `--bail` for early exit on failure
2. Linting and Type Checking
When Droid can catch errors immediately, it fixes them in the same turn instead of waiting for you to report them.
## In your AGENTS.md
## Validation Commands
- Lint (auto-fix): `npm run lint:fix`
- Type check: `npm run typecheck`
- Full validation: `npm run validate` (lint + typecheck + test)
Always run `npm run lint:fix` after making changes.
3. Clear Project Structure
Document your file organization so Droid doesnβt waste tokens exploring:
## In your AGENTS.md
## Project Structure
- `src/components/` - React components (one per file)
- `src/hooks/` - Custom React hooks
- `src/services/` - API and business logic
- `src/types/` - TypeScript type definitions
- `tests/` - Test files mirror src/ structure
When adding a new component:
1. Create component in `src/components/ComponentName/`
2. Add index.ts for exports
3. Add ComponentName.test.tsx in same directory
Agent Readiness Checklist
The Agent Readiness Report evaluates your project against criteria that directly impact token efficiency.
High-Impact Criteria
| Criterion | Token Impact | Why It Matters |
|---|
| Linter Configuration | π’ High | Catches errors immediately, no debugging cycles |
| Type Checker | π’ High | Prevents runtime errors, clearer code |
| Unit Tests Runnable | π’ High | Verification in same turn |
| AGENTS.md | π’ High | Context upfront, less exploration |
| Build Command Documentation | π‘ Medium | No guessing, fewer failed attempts |
| Dependencies Pinned | π‘ Medium | Reproducible builds |
| Pre-commit Hooks | π‘ Medium | Automatic quality enforcement |
Run the readiness report to identify gaps:
droid
> /readiness-report
Model Selection Strategy
Different models have different cost multipliers and capabilities. Match the model to the task:
Cost Multipliers
| Model | Multiplier | Best For |
|---|
| GLM-4.6 (Droid Core) | 0.25Γ | Bulk automation, simple tasks |
| Claude Haiku 4.5 | 0.4Γ | Quick edits, routine work |
| GPT-5.1 / GPT-5.1-Codex | 0.5Γ | Implementation, debugging |
| Gemini 3 Pro | 0.8Γ | Research, analysis |
| Claude Sonnet 4.5 | 1.2Γ | Balanced quality/cost |
| Claude Opus 4.5 | 2Γ | Complex reasoning, architecture |
| Claude Opus 4.1 | 6Γ | Maximum capability (use sparingly) |
Task-Based Model Selection
Simple edit, formatting β Haiku 4.5 (0.4Γ)
Implement feature from spec β GPT-5.1-Codex (0.5Γ)
Debug complex issue β Sonnet 4.5 (1.2Γ)
Architecture planning β Opus 4.5 (2Γ)
Bulk file processing β Droid Core (0.25Γ)
Reasoning Effort Impact
Higher reasoning = more βthinkingβ tokens but often fewer retries.
| Reasoning | When to Use | Token Trade-off |
|---|
| Off/None | Simple, clear tasks | Lowest per-turn, may need more turns |
| Low | Standard implementation | Good balance |
| Medium | Complex logic, debugging | Higher per-turn, fewer retries |
| High | Architecture, analysis | Highest per-turn, best first-attempt |
Rule of thumb: Use higher reasoning for tasks where a wrong first attempt would be expensive to fix.
Configure mixed models to automatically use different models for planning vs implementation. See Mixed Models for setup.
Workflow Patterns for Efficiency
Pattern 1: Spec Mode for Complex Work
Use Specification Mode (Shift+Tab or /spec) to plan before implementing.
Without Spec Mode:
Turn 1: Start implementing β wrong approach β wasted tokens
Turn 2: Undo and try different approach β more tokens
Turn 3: Finally get it right
Total: 3 turns of implementation tokens
With Spec Mode:
Turn 1: Plan with exploration β correct approach identified
Turn 2: Implement correctly
Total: 1 turn of planning + 1 turn of implementation
Use Spec Mode (Shift+Tab or /spec) for any task that:
- Touches more than 2 files
- Requires understanding existing patterns
- Has unclear requirements
- Is security-sensitive
Pattern 2: IDE Plugin for Context
Without IDE plugin, Droid must read files to understand context:
Read file A β Read file B β Read file C β Understand context β Work
(4 tool calls before actual work)
With IDE plugin, context is immediate:
Work (IDE provides open files, errors, selection)
(0 extra tool calls for context)
Pattern 3: Specific Over General
Expensive prompt:
"Fix the bug in the auth module"
β Droid reads multiple files to find the bug, explores different possibilities
Efficient prompt:
"Fix the timeout bug in src/auth/session.ts line 45 where the session expires after 5 minutes instead of 24 hours"
β Droid goes directly to the issue
Pattern 4: Batch Similar Work
Expensive:
Turn 1: "Add logging to userService"
Turn 2: "Add logging to orderService"
Turn 3: "Add logging to paymentService"
(3 turns, context rebuilt each time)
Efficient:
Turn 1: "Add structured logging to all services in src/services/. Use the pattern from src/lib/logger.ts. Services: user, order, payment."
(1 turn, pattern established once)
Reducing Token Waste
Common Waste Patterns
| Pattern | Cause | Fix |
|---|
| Multiple exploration cycles | Unclear requirements | Be specific upfront |
| Repeated file reads | Missing IDE context | Install IDE plugin |
| Failed attempts | No tests/linting | Add validation tools |
| Verbose explanations | No format constraint | Ask for concise output |
| Wrong architecture | Missing context | Use Spec Mode |
Ask for specific output formats to reduce verbosity:
"Add the feature. Return only the changed code, no explanations unless something is unclear."
"Review this code. Format: bullet list of issues only, no preamble."
"Debug this test failure. Show me the fix, then explain in 2-3 sentences."
Monitoring Your Usage
Check Current Session Cost
This shows token usage for the current session.
Track Over Time
Review your usage patterns:
- After each session, note the
/cost output
- Identify expensive sessions: What made them expensive?
- Refine approach: More context? Different model? Better prompts?
Usage Red Flags
Watch for these patterns:
- π© High read count: Droid is exploring too much (add AGENTS.md context)
- π© Multiple grep/search calls: Unclear what to look for (be more specific)
- π© Repeated similar edits: Failed attempts (check tests/linting)
- π© Very long conversations: Scope creep (break into smaller tasks)
Quick Wins Checklist
Implement these for immediate token savings:
Token Budget Guidelines
Rough guidelines for common tasks:
| Task Type | Typical Token Range | Notes |
|---|
| Quick edit | 5k-15k | Simple, specific changes |
| Feature implementation | 30k-80k | With Spec Mode planning |
| Complex debugging | 50k-150k | May need multiple attempts |
| Architecture planning | 20k-50k | High-reasoning model |
| Code review | 30k-60k | Depends on PR size |
| Bulk refactoring | 50k-200k | Many files, use efficient model |
If youβre significantly exceeding these ranges, review the waste patterns above.
Summary: The Token-Efficient Workflow
1. Set up your project
ββ AGENTS.md with commands
ββ Fast tests
ββ Linting configured
ββ IDE plugin installed
2. Start each task right
ββ Use Spec Mode for complex work
ββ Be specific about the goal
ββ Reference existing patterns
3. Choose the right model
ββ Simple β Haiku/Droid Core
ββ Standard β Codex/Sonnet
ββ Complex β Opus (with reasoning)
4. Monitor and adjust
ββ Check /cost periodically
ββ Identify expensive patterns
ββ Refine your approach
Next Steps