Chapter 05 / 06
SLOs & Error Budgets
Reliability breach in context
Step 1 — Find the SLO in breach
One or more SLO cards show a red or exhausted status. The badge tells you the current compliance versus the target — a number below target means the SLO window is actively burning.
Executive viewUnderstand what '99.5% reliability' means in practice — and what it costs when you miss it.
Are we meeting our reliability promises?
3
SLOs breaching
1
SLOs healthy
47.3×
Peak burn rate
An SLO (Service Level Objective) is a reliability promise — for example, "99.9% of payment requests must succeed over a 30-day rolling window." The error budget is the allowed margin of failure. When the burn rate is 47.3×, the entire month's error budget will be gone in about 15 hours at the current failure rate. This triggers escalation before the SLO is formally breached.
SLO status
Checkout Availability
checkout-service
Breaching
Compliance98.2%/ target 99.5%
Error budget remaining25%
Checkout Latency (P95 < 1s)
checkout-service
Breaching
Compliance87.1%/ target 95.0%
Error budget remaining14%
Payment Availability
payment-service
Exhausted
Compliance82.0%/ target 99.9%
Error budget remainingExhausted
Inventory Availability
inventory-api
Healthy
Compliance99.9%/ target 99.9%
Error budget remaining89%
Shop Frontend Availability
shop-frontend
Warning
Compliance99.9%/ target 99.9%
Error budget remaining77%
Error budget burn rate
Payment AvailabilityT-30T+0T+12T+30T+45T+60T+75T+90
Normal (<2×) Elevated (2–14×) Alert threshold (14.4×+)