Skip to main content

SLO Reference

HatiData V2 defines explicit Service Level Objectives (SLOs) for every runtime operation. Use these targets for alerting configuration and capacity planning.

Operation Latency Targets

Write Operations

OperationP50P95P99MaxNotes
Task create3ms15ms50ms200msSingle INSERT
Attempt claim5ms25ms80ms300msSELECT FOR UPDATE SKIP LOCKED
Heartbeat renew2ms10ms30ms100msSingle UPDATE on lease
Attempt complete5ms20ms60ms200msUPDATE + event INSERT
Attempt fail5ms20ms60ms200msUPDATE + event + recovery INSERT
Memory store8ms30ms80ms300msDELETE + INSERT (upsert pattern)
Reasoning step5ms20ms50ms200msINSERT with hash chain validation
Branch create10ms40ms100ms500msMetadata only (copy-on-write)
Branch merge50ms200ms500ms2sDepends on divergence size

Read Operations

OperationP50P95P99MaxNotes
Memory search10ms50ms150ms500msILIKE + optional vector similarity
Memory exact3ms10ms30ms100msPrimary key lookup
ExplainBundle20ms100ms300ms1sJoins across 5 tables
Task list5ms25ms80ms300msIndexed by project_id
Attempt chain10ms40ms100ms300msIndexed by task_id
Gate evaluation15ms60ms150ms500msPredicate evaluation + fact bag
Reward signals20ms80ms200ms500msView materialization

SQL Proxy (V1 Path)

OperationP50P95P99Notes
Simple query5ms20ms50msDirect DuckDB execution
Transpiled query10ms40ms100msSnowflake → DuckDB transpilation
Vector search15ms60ms200msEmbedding + cosine similarity
Cache hit1ms3ms10msSQL hash cache

View Freshness Targets

V2 views are derived from the runtime tables. Freshness targets define how quickly changes propagate:

ViewTarget FreshnessRefresh TriggerStaleness Alert
v_task_summary< 5sOn task state change> 30s
v_attempt_chain< 5sOn attempt completion> 30s
v_branch_divergence< 30sOn memory write to branch> 120s
v_reward_signals< 60sOn verification complete> 300s
v_lineage_graph< 10sOn artifact state change> 60s

Availability Targets

ComponentTargetMeasurement WindowDowntime Budget
SQL Proxy99.95%Monthly21.9 min/month
Control Plane API99.9%Monthly43.8 min/month
MCP Server99.9%Monthly43.8 min/month
Runtime API (V2)99.9%Monthly43.8 min/month
Lease Worker99.99%Monthly4.4 min/month
Lease Worker SLO

The lease worker has the highest availability target because lease expiry directly affects agent throughput. If the worker is down for > 3 minutes, all leased tasks expire and return to the queue — causing unnecessary retries and cost.

Error Budget Policy

SLO ViolationAction
< 1% budget remainingFeature freeze — no deploys until SLO recovers
Budget exhaustedIncident declared — rollback to last known good
3 consecutive SLO missesPost-incident review required

Alerting Thresholds

Critical (Pages On-Call)

ConditionThresholdWindow
SQL proxy down0 healthy instancesInstant
Lease worker downNo heartbeats for 3 min3 min
ExplainBundle P99 > 5sSustained for 5 min5 min
Orphan artifacts detectedv2_lineage_orphan_artifacts_total > 0Instant

Warning (Slack Notification)

ConditionThresholdWindow
Task claim P95 > 100msSustained for 10 min10 min
Memory search P95 > 200msSustained for 10 min10 min
View freshness > 2x targetAny view5 min
Error rate > 1%Any V2 endpoint5 min

Capacity Planning

MetricCurrentWarningScale Trigger
Tasks created/min50200500
Concurrent leases1050100
Memory entries/project5005,00010,000
ExplainBundle joins/sec20100200

Next Steps

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.