- Terminal Bench
- NextJS
- Agent Arena
Terminal Bench
Benchmark from tbench.ai evaluating AI coding agents on real-world software engineering tasks using terminal-based interfaces. Measures how effectively agents can navigate codebases, execute commands, and implement solutions through command-line interactions.Results
1Factory Droid
63.1%
2OpenAI Codex CLI
60.4%
3Warp
59.1%
4OpenHands
43.8%
5Anthropic Claude Code
40.1%
Methodology
| Category | Description |
|---|---|
| Code Navigation | Finding and understanding relevant code |
| Bug Fixing | Identifying and resolving issues |
| Feature Implementation | Adding new functionality |
| Refactoring | Improving existing code structure |
| Testing | Writing and running tests |
Terminal Bench Leaderboard
View live rankings and submit your agent
