🎯 SLA & Availability Targets

Define SLOs, calculate error budgets, and assess reliability under incident scenarios

10.0M requests/day

Peak: 30.0M requests/day

API Availabilityavailability

Percentage of requests that return non-5xx responses

API Latency (p95)latency

95th percentile response time under 200ms

How to define SLOs

Start with user-facing critical paths. Pick 2-3 SLIs that directly impact user experience. Set realistic targets based on current performance, then gradually improve.

Error Budget

Your allowed failure budget for 99.9% SLO target

Budget Consumed0.0%
0%100%

HEALTHY: Sufficient error budget remaining

Continue normal operations and deployments

Request-Based Budget

Total Requests300.00M
Allowed Failures300.00K
Consumed Failures0
Remaining Failures300.00K
Formula: allowed_failures = total_requests × (1 - SLO)

Time-Based Budget

Total Period720h 0m
Allowed Downtime43.2 minutes
Consumed Downtime0.0 seconds
Remaining Downtime43.2 minutes
Formula: allowed_downtime = period_minutes × (1 - SLO)

Reference: Allowed Downtime (30 days)

99% SLO7h 12m
99.9% SLO43.2 min
99.95% SLO21.6 min
99.99% SLO4.32 min

Incident Simulator

Test how different incidents affect your error budget

No incidents added yet. Select scenarios above to simulate their impact.

Understanding Burn Rate

Burn rate shows how fast you're consuming your error budget. A burn rate of 10x means you'll exhaust your budget in 10% of the measurement period. Use multi-window alerts (1h, 6h, 24h) to detect issues early.

Assessment Summary

SLO readiness evaluation for My Service

100/100

low Risk

✓ Can meet 99.9% SLO target

System can meet 99.9% SLO target with current design. Simulated incidents consume 0.0% of error budget, leaving 100.0% margin for unexpected issues.

Recommendations (5)

1.

30-day measurement window smooths out short incidents but delays feedback.

2.

Set up multi-window burn-rate alerts (1h, 6h, 24h) to catch issues early.

3.

Create SLO dashboard with error budget remaining and burn-rate trends.

4.

Document incident response procedures and practice with game days.

5.

Define error budget policy: what actions to take at 50%, 75%, and 90% consumption thresholds.

SLO Configuration Summary

Target SLO99.9%
Measurement Period30d
Traffic Volume10.00M/day
Allowed Downtime43.2 minutes
Allowed Failures300.00K
Active SLIs2
Simulated Incidents0

Next Steps

  • • Export this assessment for review with your team
  • • Implement recommended monitoring and alerting
  • • Document SLO policy and error budget thresholds
  • • Schedule regular SLO reviews and adjustments
  • • Practice incident response with game days