- Terminal Bench
- NextJS
- Agent Arena
Terminal Bench
Benchmark from tbench.ai evaluating AI coding agents on real-world software engineering tasks using terminal-based interfaces. Measures how effectively agents can navigate codebases, execute commands, and implement solutions through command-line interactions.Results
Last updated: December 2025Methodology
| Category | Description |
|---|---|
| Code Navigation | Finding and understanding relevant code |
| Bug Fixing | Identifying and resolving issues |
| Feature Implementation | Adding new functionality |
| Refactoring | Improving existing code structure |
| Testing | Writing and running tests |
Terminal Bench Leaderboard
View live rankings and submit your agent
