There is a persistent problem in how engineering organizations measure delivery performance. Most teams default to comparing themselves against DORA’s industry benchmarks (elite, high, medium, low) as if a five-person startup and a regulated financial services team should be held to the same standard. The result is a metric that looks authoritative but tells you almost nothing about whether your team is actually doing well.
Intent-based metrics offer a different approach. Instead of asking “how does our team compare to the industry?” they ask a far more useful question: “is our team delivering what it said it would?”
The Problem with External Benchmarks
The original DORA research, published by Nicole Forsgren, Jez Humble, and Gene Kim, was groundbreaking. It established causal links between software delivery performance and organizational outcomes. The four key metrics (deployment frequency, lead time for changes, change failure rate, and mean time to recovery) gave engineering leaders a shared vocabulary for talking about delivery.
But something happened as the research gained popularity. The benchmark categories (elite, high, medium, low) became targets. Teams started chasing “elite” status without asking whether that classification was appropriate for their context. A team managing a regulated medical device shouldn’t deploy as frequently as a consumer web app. A team with a contractual obligation to release monthly shouldn’t feel bad about not deploying daily.
The benchmarks became a ceiling for some teams and a source of false anxiety for others. Worse, they created perverse incentives: teams gaming deployment frequency by splitting deploys or inflating numbers rather than actually improving their delivery process.
What Intent-Based Metrics Actually Are
Intent-based metrics start from a fundamentally different premise. Rather than evaluating your team against an external benchmark, they evaluate your team against its own stated delivery intent.
Here’s how it works in practice:
-
Define your delivery model. Your team explicitly chooses how it intends to deliver software. This might be continuous deployment, scheduled releases on a cadence, or event-driven delivery triggered by specific business needs.
-
Set your delivery profile. Within your chosen model, you define what “good” looks like. For a CD team, that might mean deploying at least three times per week. For a scheduled release team, that might mean hitting every planned release window with less than a 5% change failure rate.
-
Measure against that intent. The system then evaluates your actual delivery against your stated profile, producing a performance assessment that reflects your team’s specific goals.
This approach preserves the value of the four DORA metrics while eliminating the context mismatch that makes external benchmarks unreliable.
Delivery Profiles in Practice
A delivery profile is the configuration that captures your team’s intent. It includes the expected values for each DORA metric, calibrated to the team’s delivery model.
Continuous Deployment Team
Consider a team building a consumer SaaS product. Their delivery profile might look like this:
- Deployment Frequency: 5+ deployments per week
- Lead Time for Changes: Under 4 hours from commit to production
- Change Failure Rate: Below 5%
- Mean Time to Recovery: Under 1 hour
This team values speed and small batch sizes. Their profile reflects that intent. If they deploy seven times in a week, they’re exceeding their goal. If they drop to two deploys, that’s a meaningful signal: something is impeding their intended workflow.
Scheduled Release Team
Now consider a team building enterprise software with contractual SLAs and a compliance process. Their profile looks very different:
- Release Cadence: Biweekly releases
- Release Hit Rate: 90% on-time delivery
- Lead Time for Changes: Under 5 days
- Change Failure Rate: Below 3%
- Mean Time to Recovery: Under 4 hours
For this team, the primary metric isn’t deployment frequency. It’s release hit rate. Are they consistently delivering on their planned schedule? Are the releases stable? This team deploying twice a week isn’t a problem; missing a planned release window is.
Event-Driven Team
Some teams don’t follow a fixed cadence at all. An infrastructure team that deploys in response to capacity needs, security patches, or upstream changes operates on an event-driven model. Their profile emphasizes responsiveness:
- Lead Time for Changes: Under 2 hours for critical patches
- Change Failure Rate: Below 2%
- Mean Time to Recovery: Under 30 minutes
For this team, measuring deployment frequency as a primary metric would be misleading. They might deploy once in a quiet week and twelve times during an infrastructure migration. Intent-based metrics handle this by weighting lead time and recovery metrics more heavily than raw frequency.
The Four Performance Levels
Once a team’s actual delivery is measured against its profile, the result is expressed as one of four performance levels. These levels provide a clear, actionable signal without the false precision of trying to rank teams on a numerical scale.
Exceeding: The team is consistently outperforming its own stated goals. This is worth celebrating, but it’s also a signal to re-evaluate. If a team is always exceeding, the profile may be too conservative, or the team may be ready to take on a more ambitious delivery model.
OnTrack: The team is delivering in line with its intent. This is the target state. It means the team’s execution matches its commitments, and the delivery process is working as designed.
AtRisk: The team is falling short of its goals in ways that suggest a developing problem. Maybe deployment frequency has dropped for two consecutive weeks, or change failure rate has crept above the target. AtRisk is an early warning, not an alarm. It’s a signal to investigate and course-correct before the situation worsens.
Missing: The team is significantly underperforming against its stated intent. This requires attention. It could indicate technical debt, process breakdowns, understaffing, or a delivery profile that no longer matches the team’s reality. Missing is not a judgment. It’s a diagnostic signal that triggers a conversation.
Why This Approach Works Better
Intent-based metrics address several problems that plague traditional benchmark-based measurement:
Context sensitivity. A team deploying twice a month isn’t inherently worse than a team deploying daily. What matters is whether each team is meeting its own commitments. Intent-based metrics encode this context directly.
Honest goal-setting. When teams are measured against external benchmarks, there’s a strong incentive to either sandbag (set easy targets they’ll always hit) or ignore the metrics entirely (because the benchmarks feel irrelevant). When teams define their own intent and are measured against it, the incentive shifts toward honest assessment of what’s achievable and what matters.
Meaningful conversations. An “AtRisk” signal tied to a team’s own goals generates a fundamentally different conversation than a generic “your deployment frequency is medium.” The team already knows what they intended to do. The metric shows them the gap between intent and reality, which naturally leads to asking “why?” and “what’s blocking us?”
Progressive improvement. Teams can evolve their delivery profiles over time. A team that starts with monthly releases can set a goal to move to biweekly, then weekly, at their own pace. Each transition is a deliberate decision, not a reaction to an industry benchmark that may not apply.
Intent-based metrics don’t lower the bar. They move it to where it belongs. Instead of measuring against industry benchmarks that ignore your team’s context, delivery model, and constraints, you measure against the commitments your team deliberately chose. The result is metrics that are harder to game and easier to act on.
Getting Started with Intent-Based Measurement
If your organization is currently using standard DORA benchmarks, transitioning to intent-based metrics doesn’t require throwing everything away. Start with these steps:
Audit your current delivery models. Look at how each team actually delivers software today. Don’t assume everyone follows the same model. You’ll likely find a mix of continuous deployment, scheduled releases, and ad-hoc delivery.
Have the intent conversation. Sit down with each team and ask: “How do you intend to deliver software, and what does ‘good’ look like for you?” The answers will be more nuanced and honest than any benchmark comparison.
Define initial profiles. Set delivery profiles based on current reality, not aspirational goals. Start by measuring against what the team is actually doing today, then iterate from there.
Review and adjust quarterly. Delivery profiles should evolve. As teams improve, their profiles should reflect higher expectations. As priorities shift, profiles should adapt. The quarterly review cadence keeps profiles honest without creating constant churn.
In CompassHQ, delivery profiles are configured per service, with support for all three delivery models and per-metric aggregation across service groups. The platform evaluates actual delivery data from your CI/CD pipeline against these profiles automatically, surfacing the four performance levels in a portfolio view so engineering leaders can see at a glance which teams are on track and which need attention.
The goal isn’t to remove accountability. It’s to ground accountability in something meaningful: the commitments your team made to itself.