Anomaly Detection Architecture
Technical architecture documentation for the anomaly detection system in Baselinr.
Overview
The anomaly detection system automatically identifies outliers and seasonal anomalies in profiling metrics using learned expectations as baselines. It supports multiple detection methods including IQR, MAD, EWMA, trend/seasonality decomposition, and regime shift detection.
System Design
High-Level Flow
Profiling Run Complete
↓
ResultWriter.write_results()
↓
_learn_expectations() [if enabled]
↓
_detect_anomalies() [if enabled]
↓
For each column + numeric metric:
↓
AnomalyDetector.detect_anomalies()
↓
Retrieve learned expectation from ExpectationStorage
↓
If expectation exists:
↓
Run enabled detection methods:
- Control limits check (from expectation)
- IQR detection (from historical data)
- MAD detection (from historical data)
- EWMA detection (from expectation)
- Trend/seasonality detection (from historical series)
- Regime shift detection (from historical data)
↓
Aggregate results
↓
Categorize anomalies by type
↓
Emit AnomalyDetected events via EventBus
↓
Store in baselinr_events table (via SQLEventHook)
Components
1. AnomalyDetector (baselinr/anomaly/detector.py)
Main orchestrator that coordinates multiple detection methods.
Key Methods:
detect_anomalies()- Main entry point, orchestrates detection_check_control_limits()- Checks against control limits from expectations_get_historical_metrics()- Queries historical metric values for IQR/MAD_get_historical_series()- Queries time-series data for trend/seasonality_categorize_anomaly()- Maps anomalies to specific types (row_count_spike, etc.)emit_anomaly_events()- Emits events via EventBus
Design Decisions:
- Requires learned expectations to exist (returns early if not found)
- Runs multiple detection methods in parallel (where possible)
- Aggregates results from all methods
- Uses expectations as baselines (control limits, EWMA values)
- Falls back to raw historical data for methods that need distributions (IQR, MAD)
2. Detection Methods (baselinr/anomaly/detection_methods.py)
Individual detection algorithms implemented as separate classes.
IQRDetector:
- Calculates Q1 (25th percentile) and Q3 (75th percentile) from historical values
- Computes IQR = Q3 - Q1
- Flags values outside [Q1 - threshold×IQR, Q3 + threshold×IQR]
- Best for: Non-normal distributions, robust outlier detection
MADDetector:
- Calculates median and MAD (Median Absolute Deviation) from historical values
- Computes modified z-score = 0.6745 × (value - median) / MAD
- Flags values with |modified_z_score| > threshold
- Best for: Non-normal distributions, metrics with outliers in history
EWMADetector:
- Uses EWMA value from
LearnedExpectation - Compares current value to EWMA-based prediction
- Uses
expected_stddevfor threshold calculation - Flags if deviation > threshold × stddev
- Best for: Metrics with trends, detecting gradual shifts
TrendSeasonalityDetector:
- Extracts trend using simple moving average (configurable window)
- Detects weekly/monthly seasonal patterns
- Removes trend and seasonality to get residuals
- Flags if detrended/deseasonalized value exceeds threshold
- Lightweight heuristic-based (no optimization routines)
- Best for: Metrics with strong seasonal patterns
RegimeShiftDetector:
- Compares recent window (last N runs) vs historical baseline
- Options:
- Statistical test: Two-sample t-test approximation (Welch's)
- Simple comparison: Mean shift > threshold × stddev
- Flags if significant shift detected
- Best for: Detecting sudden behavioral changes
3. AnomalyResult (baselinr/anomaly/detector.py)
Dataclass representing a detected anomaly.
Fields:
anomaly_type: Enum (IQR_DEVIATION, CONTROL_LIMIT_BREACH, etc.)table_name,column_name,metric_name: Identityexpected_value,actual_value: Comparison valuesdeviation_score: Normalized score (0-1)severity: "low", "medium", "high"detection_method: Which method detected itmetadata: Additional context (e.g., Q1/Q3, trend info)
4. Event System Integration
Anomalies are emitted as AnomalyDetected events via the EventBus:
AnomalyDetected(
event_type="AnomalyDetected",
timestamp=datetime.utcnow(),
table="users",
column="age",
metric="mean",
anomaly_type="control_limit_breach",
expected_value=30.0,
actual_value=50.0,
severity="high",
detection_method="control_limits",
metadata={...}
)
Events are automatically stored in baselinr_events table via existing hooks (SQLEventHook, SnowflakeEventHook).
5. ResultWriter Integration (baselinr/storage/writer.py)
Integration point where anomaly detection is triggered after profiling.
Integration:
- Called after
_learn_expectations()completes - Only executes if
config.enable_anomaly_detectionis True - Iterates through columns and numeric metrics
- Handles errors gracefully (logs warning, continues)
- Doesn't block profiling completion if detection fails
Code Flow:
def _detect_anomalies(self, result: ProfilingResult):
detector = AnomalyDetector(...)
for column_data in result.columns:
for metric_name in numeric_metrics:
anomalies = detector.detect_anomalies(...)
if anomalies:
detector.emit_anomaly_events(anomalies)
Detection Algorithms
Control Limits (Shewhart)
Algorithm:
- Retrieve control limits from
LearnedExpectation - Check if
current_value < LCLorcurrent_value > UCL - Calculate deviation in stddevs:
|value - mean| / stddev - Determine severity based on deviation magnitude
Severity Mapping:
-
3 stddevs: "high"
-
2 stddevs: "medium"
- Otherwise: "low"
Complexity: O(1) - Direct lookup from expectations
IQR (Interquartile Range)
Algorithm:
- Fetch historical metric values from
baselinr_results - Sort values
- Calculate Q1 (25th percentile) and Q3 (75th percentile)
- Compute IQR = Q3 - Q1
- Calculate bounds: [Q1 - threshold×IQR, Q3 + threshold×IQR]
- Flag if
current_valueoutside bounds
Percentile Calculation:
- Uses linear interpolation for fractional indices
- Handles edge cases (zero IQR, insufficient data)
Complexity: O(n log n) - Sorting historical values
MAD (Median Absolute Deviation)
Algorithm:
- Fetch historical metric values
- Calculate median
- Calculate MAD = median(|x_i - median|)
- Compute modified z-score = 0.6745 × (value - median) / MAD
- Flag if |modified_z_score| > threshold
Why Modified Z-Score:
- 0.6745 constant makes MAD comparable to stddev for normal distributions
- More robust to outliers than standard z-score
Complexity: O(n) - Median and MAD calculation
EWMA (Exponentially Weighted Moving Average)
Algorithm:
- Retrieve
ewma_valueandexpected_stddevfromLearnedExpectation - Calculate deviation =
current_value - ewma_value - Calculate deviation_stddevs = |deviation| / stddev
- Flag if deviation_stddevs > threshold
Fallback:
- If no stddev available, uses 5% of mean as threshold
Complexity: O(1) - Direct lookup from expectations
Trend/Seasonality Detection
Algorithm:
- Fetch historical time-series (timestamp, value pairs)
- Trend Extraction: Apply simple moving average
- Window size: configurable (default: 7)
- Trend = mean of windowed values
- Deseasonalize: Calculate residuals = values - trend
- Seasonality Detection:
- Extract day-of-week for weekly seasonality
- Group residuals by day-of-week
- Calculate mean/stddev per day
- Expected residual = mean for current day-of-week
- Anomaly Detection:
- Current residual = current_value - current_trend
- Deviation = current_residual - expected_residual
- Flag if deviation > threshold × residual_stddev
Why Lightweight:
- No optimization routines (unlike full Prophet)
- Uses simple moving average instead of exponential smoothing
- Heuristic-based seasonality detection
Complexity: O(n) - Single pass through historical series
Regime Shift Detection
Algorithm:
- Fetch historical metric values
- Split into recent window (last N runs) and baseline (remaining)
- Statistical Test Option:
- Calculate means: recent_mean, baseline_mean
- Calculate variances: recent_var, baseline_var
- Pooled standard error = sqrt(recent_var/n1 + baseline_var/n2)
- t-statistic = |recent_mean - baseline_mean| / pooled_se
- Critical t-value based on sensitivity (p-value threshold)
- Flag if t-stat > critical_t
- Simple Comparison Option:
- Mean shift = |recent_mean - baseline_mean|
- Threshold = 2.0 × baseline_stddev
- Flag if mean_shift > threshold
Statistical Test:
- Uses Welch's t-test approximation (two-sample, unequal variances)
- Normal approximation for critical t-values
Complexity: O(n) - Mean and variance calculation
Integration Points
With Expectation Learning
Anomaly detection depends on expectation learning:
- Prerequisite: Learned expectations must exist
- Baseline Usage: Uses expectations for control limits and EWMA
- Complementary: Anomalies are detected after expectations are learned/updated
With Event System
Anomalies are integrated into the existing event infrastructure:
- Event Emission:
AnomalyDetectedevents emitted via EventBus - Storage: Events stored in
baselinr_eventstable (via hooks) - Consistency: Same pattern as drift detection events
With Drift Detection
Anomaly detection complements drift detection:
- Different Purpose: Anomalies detect outliers, drift detects changes
- Different Baselines: Anomalies use expectations, drift uses baselines
- Complementary: Both can detect issues, but from different perspectives
Performance Considerations
Database Queries
Historical Data Fetching:
- IQR/MAD: Fetches all historical values for metric (window: 90 days default)
- Trend/Seasonality: Fetches time-series with timestamps
- Regime Shift: Fetches historical values
Optimization Strategies:
- Indexes on
baselinr_results(dataset_name, column_name, metric_name, profiled_at) - Window limiting (90 days default) reduces query size
- Batch processing for multiple columns/metrics
Computational Complexity
- Control Limits: O(1) - Direct lookup
- EWMA: O(1) - Direct lookup
- IQR: O(n log n) - Sorting
- MAD: O(n) - Median/MAD calculation
- Trend/Seasonality: O(n) - Moving average
- Regime Shift: O(n) - Mean/variance calculation
Where n = number of historical runs (typically 10-30).
Caching
Currently no caching implemented. Considerations:
- Expectations are cached by ExpectationStorage (per-request)
- Historical data queries could be cached (TTL: 1 hour)
- Detection results could be cached (TTL: 5 minutes)
Error Handling
Graceful Degradation
All detection methods handle errors gracefully:
- Missing Expectations: Returns empty list (logs debug)
- Insufficient Data: Returns empty result with reason in metadata
- Calculation Errors: Catches exceptions, logs warning, returns safe default
- Database Errors: Catches SQL errors, logs warning, continues
Logging
- Debug: Normal operations (no expectation found, insufficient data)
- Warning: Errors during detection (calculation failures, DB errors)
- Info: Significant anomalies detected
Configuration Schema
Configuration is stored in StorageConfig:
class StorageConfig(BaseModel):
enable_anomaly_detection: bool = False
anomaly_enabled_methods: List[str] = [...]
anomaly_iqr_threshold: float = 1.5
anomaly_mad_threshold: float = 3.0
anomaly_ewma_deviation_threshold: float = 2.0
anomaly_seasonality_enabled: bool = True
anomaly_regime_shift_enabled: bool = True
anomaly_regime_shift_window: int = 3
anomaly_regime_shift_sensitivity: float = 0.05
Future Enhancements
- Caching: Cache historical data queries and detection results
- Machine Learning: Add ML-based anomaly detection (Isolation Forest, etc.)
- Adaptive Thresholds: Automatically tune thresholds based on false positive rate
- Anomaly Scoring: Combine multiple methods into single anomaly score
- Trend Prediction: Use trend models to predict expected values
- Multi-variate Detection: Detect anomalies across multiple metrics simultaneously
Testing Strategy
Unit Tests
- Detection Methods: Test each method independently with known inputs
- AnomalyDetector: Test orchestration logic, event emission
- Edge Cases: Insufficient data, missing expectations, zero variance
Integration Tests
- End-to-End Workflow: Learn expectations → Detect anomalies
- Event Emission: Verify events are emitted and stored
- ResultWriter Integration: Test integration with profiling workflow
Test Coverage
- All detection methods have unit tests
- Detector orchestration has unit tests
- Integration tests cover full workflow
- Edge cases covered (insufficient data, errors, etc.)