Agent Driven Development: How to Build Software with AI
When used correctly, AI agents provide massive leverage for software development. Many developers acknowledge this reality and want to adapt their workflows. Yet a question that has been left unanswered: how do you actually build with agents the right way? Standard software development practices have been built in a very human‑centric way. In the pre-AI era, the largest bottleneck was writing lines of code. With AI, that is quickly changing. But without best practices, a new, similarly large bottleneck will emerge at later parts in the software development lifecycle: review and validation. Agent‑Driven Development (ADD) is a disciplined approach to building software with autonomous agents. It allows you to dramatically increase your coding output, while addressing common failure patterns through thorough planning and clear verification procedures. Development is agent‑driven when the agent runs the full inner loop: gather context, plan, implement, run validation, and submit a reviewable change. Along the way, it leaves a clear trace of its reasoning and decisions. Humans still make architectural decisions, set guardrails, and intervene when an agent makes mistakes.Core Foundations

- Make the request precise enough such that success can be demonstrated
- Keep the task small enough such that any wrong assumption gets addressed before compounding
- Create an environment that lends itself to automatic, objective verification over manual review
Specification is Everything
A guiding principle when working with agents is that you get out what you put in. The more precise your guidance – or “specification” – for the task at hand, the more likely the agent is to accomplish that task to your liking. Ask yourself, “what guidance would I need to provide a capable new hire such that they would not need to interrupt me more than once?” Use that as a gut check for precision. Generally, it is clear when an ask is underspecified: “fix the bug preventing users from signing in” or “improve performance in our query API” or “add SSO with Google to the app.” There’s no anchor. No boundary. No measurable success. The agent will dutifully explore and guess, but will inevitably violate assumptions that you had, but did not explicitly state. Let’s compare a loose specification with a precise specification:
Loose:“Token refresh is broken because it’s dropping the new token. Can you please fix this?”
Precise:“I think concurrent refresh calls sometimes drop the newly issued token. This logic is handled in packages/session/refresh.ts. I’ve been reproducing this by running yarn test session_refresh_race. I would like all tests to pass, with no reduction in existing coverage, no new dependencies, and only files inside packages/session/ touched.”
A quick self‑check: Can you point to (1) where the agent should start reading and (2) the artifact that proves completion? If either answer is mushy, you’re still brainstorming. You cannot delegate what you cannot define.
Practical Workflows That Work

Explore → Plan → Code → Verify
Begin with a loop that mirrors a thoughtful human workflow:- Explore – Ask the agent to explore the specific parts of the codebase you know are relevant.
- Plan – Ask the agent to come up with a plan to implement the task you have in mind. Review the agent’s proposed steps. Tighten the scope or boundaries before any edits occur.
- Code – Approve small, well-scoped changes. Encourage checkpoint commits so you can bisect if necessary.
- Verify – Require objective proof: tests, lint, type check, and a diff confined to agreed paths. A commit message should explain intent, not just “fix stuff.”


Setting Up an Environment For Success
Keep a short AGENTS.md file at the repo root that answers three questions:- How do I build, test, and lint?
- How is the codebase organized at a high level?
- What evidence must accompany a pull request?
Boundaries and Permissions
Consider setting your risk profile based on the types of commands you’re willing to run automatically.Low risk
File edits, formatting, local test runs
Medium risk
Commits, dependency bumps, schema dry-runs
High risk
Destructive scripts, production data, sensitive directories
Deterministic & Verifiable Environments
Agents thrive when every run is verifiable and repeatable. You should have:- Highly opinionated format and lint checks to catch style drift
- Unit test and/or component tests to confirm and preserve local behavior
- When applicable, newly added failing tests prove you reproduced the bug or requirement
- Static scan and/or security scan to ensure you did not ship a known vulnerability
- A review-oriented agent in CI that flags more complex design patterns and maintainability concerns
Real-World Examples
Bug Fix: Issue to PR
Write a precise spec that names the starting file, the reproduction command, and the proof of success. The agent adds a failing test, watches it fail, patches only the allowed path, re-runs the narrow test set, then the full suite. A one-paragraph commit message explains the root cause and the fix.Feature Development in 2–6 Hours
Draft a mini-PRD in the prompt: user story, affected components, constraints, and done definition. Ask for a plan first. Break execution into checkpoints—backend API, frontend stub, polish. After each checkpoint, run targeted tests and one manual verification check before green-lighting the next stage.Maintenance and Upkeep
Delegate drudge work that already has clear rails: formatting bursts inside one package, changelog-aware library bumps, or doc updates after a merge. Still gate each PR on the same objective checks; automation should never bypass quality bars.When Things Go Wrong
There are warning signs of agent drift, if you see them you should interrupt and intervene:- Plans that rewrite themselves mid-execution
- Edits outside the declared paths
- Fixes claimed without failing tests to prove they work
- Diffs bloated with unrelated changes
Recovery Playbook

Tighten the spec
Narrow the directory or test the agent may touch.
Salvage the good
Keep valid artifacts such as a failing test; revert noisy edits.
Restart clean
Launch a fresh session with improved instructions.
Take over
When you can tell the agent is failing to succeed, pair program the final changes.