> ## Documentation Index
> Fetch the complete documentation index at: https://docs.factory.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Terminal Bench

> Terminal Bench results and methodology for AI coding agents.

export const BarChart = ({data, valueKey, labelKey = "name", valueLabel = "Score", maxValue}) => {
  const values = data.map(d => d[valueKey]);
  const topValue = values[0];
  const minValue = Math.min(...values);
  const baselineOffset = topValue - (topValue - minValue) / 0.8 * 1;
  return <div className="space-y-3 my-6 not-prose">
      {data.map((item, idx) => {
    const value = item[valueKey];
    const percentage = (value - baselineOffset) / (topValue - baselineOffset) * 80;
    const isDroid = item[labelKey].toLowerCase().includes('droid') || item[labelKey].toLowerCase().includes('factory');
    return <div key={idx}>
            <div className="flex items-center gap-2 mb-1.5">
              <span className="w-6 text-sm font-mono text-zinc-400 dark:text-zinc-500 text-right">
                {idx + 1}
              </span>
              <span className="text-sm font-medium text-zinc-900 dark:text-zinc-100">
                {item[labelKey]}
              </span>
            </div>
            <div className="flex items-center gap-3">
              <div className="w-6" />
              <div className="flex-1 h-7 relative flex items-center">
                <div className="h-full rounded-sm transition-all duration-500" style={{
      width: `${percentage}%`,
      background: isDroid ? 'linear-gradient(to right, #f97316, #fb923c)' : 'linear-gradient(to right, #a1a1aa, #d4d4d8)'
    }} />
                <span className="ml-2 text-xs font-mono text-zinc-600 dark:text-zinc-400">
                  {typeof value === 'number' && value % 1 !== 0 ? value.toFixed(1) : value}{valueLabel.includes('%') ? '%' : ''}
                </span>
              </div>
            </div>
          </div>;
  })}
    </div>;
};

export const terminalBenchData = [{
  name: "Factory Droid",
  model: "Claude Opus 4.5",
  accuracy: 63.1
}, {
  name: "OpenAI Codex CLI",
  model: "GPT-5.1-Codex-Max",
  accuracy: 60.4
}, {
  name: "Warp",
  model: "Claude Opus 4.5",
  accuracy: 59.1
}, {
  name: "OpenHands",
  model: "Gemini 3 Pro",
  accuracy: 43.8
}, {
  name: "Anthropic Claude Code",
  model: "Gemini 3 Pro",
  accuracy: 40.1
}];

Benchmark from [tbench.ai](https://www.tbench.ai) evaluating AI coding agents on real-world software engineering tasks using terminal-based interfaces. Measures how effectively agents can navigate codebases, execute commands, and implement solutions through command-line interactions.

### Results

<BarChart data={terminalBenchData} valueKey="accuracy" valueLabel="%" maxValue={100} />

*Last updated: December 2025*

### Methodology

| Category                   | Description                             |
| -------------------------- | --------------------------------------- |
| **Code Navigation**        | Finding and understanding relevant code |
| **Bug Fixing**             | Identifying and resolving issues        |
| **Feature Implementation** | Adding new functionality                |
| **Refactoring**            | Improving existing code structure       |
| **Testing**                | Writing and running tests               |

Tasks are scored on **correctness**, **efficiency**, and **code quality**.

<Card title="Terminal Bench Leaderboard" icon="trophy" href="https://www.tbench.ai/leaderboard/terminal-bench/2.0">
  View live rankings and submit your agent
</Card>
