HieraChain continuously monitors system health across 4 risk domains (Consensus, Security, Performance, Storage). When thresholds are breached, AlertManager creates alerts, suppresses duplicates via cooldown, notifies via Email/Webhook, and auto-escalates unacknowledged alerts after a configurable timeout.
Flow Diagram
sequenceDiagram
autonumber
participant PM as 📊 PerformanceMonitor
participant RA as 🔍 RiskAnalyzer
participant AM as 🚨 AlertManager
participant AD as 📈 AnomalyDetector
participant NTF as 📧 Email / Webhook Notifier
PM->>RA: perform_comprehensive_analysis(system_data)
par Consensus risks
RA->>RA: analyze_consensus_risks()<br/>Check: node_count >= 3f+1, leader_timeout, msg_verify_rate
and Security risks
RA->>RA: analyze_security_risks()<br/>Check: cert_expiry, failed_auth, encryption_strength
and Performance risks
RA->>RA: analyze_performance_risks()<br/>Check: CPU%, memory%, event_pool_size
and Storage risks
RA->>RA: analyze_storage_risks()<br/>Check: world_state_size, backup_age
end
RA->>RA: Update active_risks + risk_history
RA-->>PM: all_risks { consensus, security, performance, storage }
PM->>AM: check_metric(metric_name, value, source)
AM->>AD: add_data_point(metric_name, value)
AM->>AM: _evaluate_rule_condition(rule, value)
AM->>AM: _is_in_cooldown(rule)
alt Threshold breached AND not in cooldown
AM->>AM: _create_alert(rule, value, source)
AM->>AM: _is_duplicate_alert() → suppress if duplicate
AM->>AM: active_alerts[alert_id] = Alert
AM->>NTF: _send_notifications(alert)
NTF-->>AM: sent / failed
Note over AM: Escalation timer starts (default 30 min)
alt Alert not acknowledged within escalation_time
AM->>AM: _escalate_alert(alert_id)<br/>alert.escalation_level += 1
AM->>NTF: Re-notify with ESCALATED prefix
end
end
Note over AM: Operator acknowledges or system auto-resolves
AM->>AM: acknowledge_alert(alert_id) → ACKNOWLEDGED
AM->>AM: resolve_alert(alert_id) → RESOLVED + remove from active_alerts
Alert Severity Levels
Severity
Trigger Example
Auto-Escalate After
INFO
Normal metric fluctuation
Never
WARNING
CPU > 85%, minor risk detected
30 minutes
CRITICAL
CPU > 95%, consensus success < 95%
Immediate (5 min)
EMERGENCY
Manual declaration or compound failure
Immediate
Risk Domains
Domain
Key Metrics Checked
Consensus
node_count >= 3f+1, leader election time, message verification rate