Defining scaling rules seems simple.
But in practice, it is one of the main reasons for incidents:
- are delayed
- bounce between teams
- or escalate too late
And almost always the problem is not the tool.
👉 It’s how the rules are defined.
In simple
A good scaling rule should:
- notify the correct person
- ensure that someone responds
- escalate if no response
- do it in a timely manner
- avoid unnecessary noise
Not before, not after. At the right time.
The most common mistake
Many rules are defined as follows:
👉 “if no one responds in X minutes, escalate.”
And although it sounds logical, it doesn’t always work well.
Because not all alerts are the same.
Result:
- critical alerts escalate late
- minor alerts escalate unnecessarily
- team loses focus
👉 More noise is generated than value.
Understanding the levels: N1, N2, N3
To define scaling properly, one must first understand this:
🔹 N1 (first level)
- review the alert
- validates if real
- attempts to solve simple problems
👉 is the first filter
🔹 N2
- further analyzes
- solves more complex technical problems
🔹 N3
- specialists
- critical or structural problems
The goal of scaling is clear: to solve as quickly as possible, with the correct level
So what happens in practice?
Many times:
- N1 is overloaded
- is delayed in analyzing
- or simply pass everything to N2
👉 Result: N2 and N3 collapse
Or the other way around:
- the N1 stays too long with something it cannot resolve
👉 Result: the problem escalates late.
This is where time is gained (or lost).
Efficiency is not in scaling faster.
You are in:
👉 better scaling
That means:
- know when to climb
- to know who to scale
- and avoid unnecessary escalations
What if the N1 was not manual?
Here is something interesting.
Much of N1’s work is:
- review alerts
- search for context
- identify probable cause
- perform simple actions
👉 repetitive tasks
Today this can be automated.
With tools such as 24Brains (24Cevent add-on), the N1 can be automated:
- analyzes the alert
- systems consultation
- identifies root cause
- executes simple actions if applicable
- decides whether to scale (and with what context)
👉 It’s like having an N1 working in seconds.
What does this change?
A lot.
Formerly
- N1 takes time
- filter manually
- scale with little information
Then
- immediate analysis
- fewer false positives
- scaling with clear context
- less load for N2 and N3
👉 Scaling stops being reactive. It becomes intelligent.
How to define good scaling rules?
1. Classify your alerts well
Not all alerts should escalate equally.
Defines:
- reviews
- high
- stockings
- cancellations
👉 each with its own logic.
2. Define realistic times
Don’t use one time for everything.
Example:
- criticality → scalar in minutes
- average → more time
- low → maybe not even climb
3. Secure confirmation
Notification is not enough.
👉 someone has to accept the alert.
If there is no confirmation:
auto-scaling
4. Scale with context
One of the biggest mistakes:
scalar without information
Each escalation should include:
- what happened
- what was intended
- what was found
👉 This greatly reduces resolution times.
5. Avoid climbing for climbing’s sake
More scaling ≠ better operation
too much climbing leads to fatigue
The objective is:
👉 f ewer but better scalings
So, what is good climbing?
It is not just passing the alert to another level.
Es:
- filter correctly
- make an informed decision
- provide context
- ensure response
👉 and do it fast
Scaling rules are not a technical detail.
They are one of the most important parts of the operation.
And when combined with automation in the N1:
👉 the impact is immediate
- less noise
- less downtime
- less overhead
- better decisions
If today your team feels that alerts are escalating late, poorly or with little context, it’s probably not a problem with the tools, but with how the rules are defined.
24Cevent allows you to set up flexible escalations and ensure responsiveness, and together with 24Brains automates the initial analysis, helping each alert reach the right level at the right time.