Why Dashboards Are Not Enough
Dashboards show what happened. Anomaly detection tells you something unexpected is happening now.
Real cases: Stripe webhook broke (revenue -40% for 6 hours). Deploy doubled latency (conversion -25%). Campaign went viral (server costs +800%).
Detection Pipeline
Metrics --> TimescaleDB --> Detection Engine --> Alerts
Pick Critical Metrics
Focus on revenue-impacting KPIs: hourly revenue, conversion, error rate, DAU.
Detection Methods
| Method | Best For | Complexity |
|---|
| Z-score | Stable metrics | Low |
| Moving average | Trending | Medium |
| Isolation Forest | Multi-dim | Medium |
| Prophet | Seasonal | High |
Smart Alerts
Reduce alert fatigue with severity tiers, cooldowns, and AI root cause summaries.
Quick Start: Z-Score
import numpy as np
def detect_anomaly(values, new_value, threshold=3):
mean = np.mean(values[-30:])
std = np.std(values[-30:])
z = (new_value - mean) / std if std > 0 else 0
return abs(z) > threshold
Z-score catches 80% of anomalies with zero ML complexity.