In many IT operations, things are measured.
But the correct ones are not always measured.
You review dashboards, alerts, tickets…
but when someone asks:
are we operating better or not?
the answer is not always clear.
Because good operational management is not measured by the amount of activity.
It is measured by impact.
In simple
Value metrics are those that allow you to respond:
👉 how quickly you detect, how well you react, and how much impact you avoid.
It is not measuring more.
It’s measuring what really matters.
The most common mistake
Many operations focus on metrics such as:
- number of alerts
- number of tickets
- number of incidents
And while they serve as context, they do not indicate actual performance.
Because you can have:
- many tickets and good operation
- few tickets and poor management
👉 volume does not reflect quality
The metrics that really matter
Here are the metrics that do help you evaluate your operation in a concrete and measurable way.
1. MTTA (Mean Time To Acknowledge)
What it measures:
time from when the alert occurs until someone picks it up
Why it matters:
👉 defines how quickly the reaction starts.
Actual reference:
- high performance: < 5 minutes
- acceptable: 5-15 minutes
- critical: > 15 minutes
If this number is high, the problem is not technical.
It is of visibility, notification or responsibility.
2. MTTR (Mean Time To Resolve)
What it measures:
total time from incident occurrence to incident resolution
Why it matters:
👉 is the real impact on the business.
MTTR consists of several stages:
- detection
- notification
- analysis
- resolution
👉 improving just one part does not always improve the whole.
Detection time
What it measures:
how much time elapses between the actual failure and its detection
Why it matters:
👉 every undetected minute is cumulative impact.
In many organizations, this time is higher than they think, especially when they hear about it from users.
4. Rate of actionable alerts
What it measures:
percentage of alerts that actually require action
Simple formula:
(useful alerts / total alerts) * 100
Why it matters:
👉 measures the noise level
Reference:
- ideal: > 70% actionable
- low: < 50%.
If this number is low:
- there is fatigue
- lowers the attention
- increases the risk of ignoring what is important
5. SLA compliance rate
What it measures:
percentage of incidents resolved within the committed timeframe
Why it matters:
👉 reflects the ability to deliver on the business.
But beware:
SLA compliance does not always mean good operation if:
- SLAs are poorly defined
- prioritizing meeting metrics instead of solving well
6. Effective escalation rate
What it measures:
percentage of incidents that escalate successfully
Why it matters:
👉 indicates management maturity
Typical problems:
- climb too fast → noise
- climb late → impact
👉 balance is key
7. Incidents detected before the user
What it measures:
percentage of incidents detected internally before they are reported by the user.
Why it matters:
👉 is one of the most valuable metrics of actual experience.
Reference:
- high level: > 80%.
- average: 50-80%.
- low: < 50%.
If this number is low, your monitoring or management is not meeting its objective.
8. Coordination time
What it measures:
how much time is wasted in defining who does what
It is not always measured directly, but can be estimated through:
- action start times
- time between escalation and response
Why it matters:
👉 is one of the biggest hidden costs.
A key point
Not all metrics have the same value.
You can have dashboards full of data…
but if you can’t make decisions with them:
👉 no good
Value metrics have to allow you to:
- identify bottlenecks
- prioritize improvements
- measure actual impact
How to use them correctly
It is not a question of measuring everything at the same time.
You can start with:
- MTTA
- MTTR
- rate of actionable alerts
And then move on to:
- early detection
- escalation
- coordination
👉 improvement is progressive, but measurable.
A simple example
Operation without clear metrics
- “we think we’re doing well”
- decisions based on perception
- repeated problems
Operation with value metrics
- bottlenecks are identified
- processes are adjusted
- the impact of each improvement is measured
Result: actual performance
What is important in the background
You don’t improve what you can’t measure.
But you don’t get better by measuring just anything either.
The key is to focus on metrics that connect directly to:
- speed
- quality
- impact
👉 the operation ceases to be reactive when it becomes measurable.
If you have data today but struggle to understand if you are really improving, you probably lack focus on the right metrics.
👉 24Cevent allows you to measure response, confirmation, escalation and resolution times, providing clear visibility into operational performance and helping to identify concrete opportunities for improvement.