Getting Started
There are four ways to interact with Autonomy Maturity:- CLI: Run the
/readiness-reportslash command to evaluate a repository’s maturity level - Web Dashboard: View your organization’s maturity scores in the Autonomy Maturity dashboard
- API: Programmatically access maturity reports via the Maturity Reports API
- Remediation (Coming Soon): Automatically fix failing criteria directly from the CLI or dashboard
What Autonomous Development Looks Like
What does it look like when an organization reaches high autonomy maturity? Here are concrete examples of workflows that become possible when the technical foundation is in place.Code from Conversation
A developer describes what they need built, and the system executes through deployment. Input: “Refactor the authentication module to support OAuth2 with PKCE, maintaining backward compatibility.” The system:- Generates idiomatic code following established patterns
- Validates against linters, type checkers, and test suites
- Handles the pull request and code review process
- Updates documentation and notifies stakeholders
- Deploys and monitors for issues
Design to Implementation
A designer shares a mockup, and the system implements it without handoffs. Input: Figma mockup of a new dashboard. The system:- Interprets the visual specification and design system
- Implements UI components with proper styling and responsiveness
- Connects to existing APIs or creates new endpoints
- Adds loading states, error handling, and accessibility
- Generates tests for user interactions
- Deploys to staging for design review
Bug to Deployed Fix
A customer reports an issue, and the system diagnoses, fixes, and deploys autonomously. Input: Bug report through support system. The system:- Triages based on error logs and impact
- Creates a ticket with reproduction steps and diagnostic context
- Identifies the root cause from code analysis
- Generates a fix and comprehensive tests
- Opens a PR and assigns a developer for review
- Notifies support when the fix is deployed
The 5 Maturity Levels
Repositories progress through five distinct levels, each representing a qualitative shift in how autonomous agents can operate within your codebase.| Level | Name | Description | Example Criteria |
|---|---|---|---|
| 1 | Functional | Code runs, but requires manual setup and lacks automated validation. Basic tooling that every repository should have. | README, linter, type checker, unit tests |
| 2 | Documented | Basic documentation and process exist. Workflows are written down and some automation is in place. | AGENTS.md, devcontainer, pre-commit hooks, branch protection |
| 3 | Standardized | Clear processes are defined, documented, and enforced through automation. Development is standardized across the organization. | Integration tests, secret scanning, distributed tracing, metrics |
| 4 | Optimized | Fast feedback loops and data-driven improvement. Systems are designed for productivity and measured continuously. | Fast CI feedback, regular deployment frequency, flaky test detection |
| 5 | Autonomous | Systems are self-improving with sophisticated orchestration. Complex requirements decompose automatically into parallelized execution. | Self-improving systems |
How Scoring Works
Level Progression
To unlock a level, you must pass 80% of the criteria from the previous level. This creates a gated progression system:- All repositories start at Level 1
- Pass 80% of Level 1 criteria → Unlock Level 2
- Pass 80% of Level 2 criteria → Unlock Level 3
- And so on…
Evaluation Scopes
Criteria are evaluated at two different scopes:- Repository Scope: Evaluated once for the entire repository (e.g., CODEOWNERS file exists, branch protection enabled)
- Application Scope: Evaluated per application in monorepos (e.g., linter configured, unit tests exist for each app)
3 / 4 (3 out of 4 apps pass).
The Technical Pillars
The Autonomy Maturity Model organizes criteria into nine technical pillars that form the foundation for autonomous operation.Style & Validation
Linters, type checkers, and formatters catch obvious errors instantly. Agents avoid wasting cycles on syntax errors, style inconsistencies, and type mismatches.- Example criteria: Linter configuration, type checker, code formatter, pre-commit hooks
Build System
Clear, deterministic build commands let agents verify their changes compile and run before committing. No guessing about which commands to run or which flags to pass.- Example criteria: Build command documented, dependencies pinned, VCS CLI tools
Testing
Fast unit and integration tests create tight feedback loops. Agents learn whether their changes work correctly by running tests, seeing failures, and iterating.- Example criteria: Unit tests exist, integration tests exist, tests runnable locally
Documentation
Explicit instructions house the tribal knowledge that “everyone just knows.” How to set up the environment, run tests, deploy changes, debug issues. Agents need written instructions that are discoverable, accurate, and maintained.- Example criteria: AGENTS.md, README, documentation freshness
Development Environment
Reproducible environments ensure consistency. When developers and agents work in identical environments, entire classes of problems disappear. No more “works on my machine.”- Example criteria: Devcontainer, environment template, local services setup
Debugging & Observability
Structured logging, tracing, and metrics give agents runtime visibility into what code actually does. Good observability turns “it failed” into “it failed because X was null when calling Y after receiving Z.”- Example criteria: Structured logging, distributed tracing, metrics collection
Security
Branch protection, secret scanning, and code owners prevent agents from introducing security issues or bypassing required reviews. Agents move fast—automated guardrails ensure they move fast safely.- Example criteria: Branch protection, secret scanning, CODEOWNERS
Task Discovery
Infrastructure for agents to find and scope work autonomously. Well-structured issues and templates help agents understand what needs to be done.- Example criteria: Issue templates, issue labeling system, PR templates
Product & Experimentation
Tools for measuring impact, running experiments, and understanding user behavior. Agents can see whether features are actually used and measure the impact of their changes.- Example criteria: Product analytics instrumentation, experiment infrastructure
