Data Durability & Consistency

HatiData guarantees zero data loss (RPO = 0) through three layers of defense-in-depth.

Three-Layer Durability Model

Layer	Protects Against	RPO
File-backed DuckDB	Container restart, OOM	0
Write-Ahead Log (fsync)	Process crash between flushes	0
Periodic Parquet flush (30s)	Node loss, disk failure	≤30s

Each layer provides independent protection. Even if one layer fails, the next layer ensures data can be recovered.

How It Works

The durability pipeline processes every write through three stages:

INSERT → DuckDB (RAM + file) → WAL fsync → ack to client → mark dirty
  ↓ (every 30s)
PeriodicFlusher → Parquet + Iceberg metadata → S3/MinIO → WAL truncate
  ↓ (on crash)
Restart → WAL replay → mark dirty → Parquet flush → TableLoader → serve

Layer 1: File-Backed DuckDB

DuckDB runs in file-backed mode (not in-memory). The database file on disk survives container restarts and OOM kills. This is the first line of defense — if the process dies and restarts on the same node, the data is already there.

Layer 2: Write-Ahead Log

Every mutation is appended to a newline-delimited JSON WAL and fsynced to disk before the client receives acknowledgment. If the process crashes between DuckDB writes and Parquet flushes, the WAL is replayed on restart to recover the exact state.

The WAL provides:

fsync on every write — data hits stable storage before acknowledgment
Replay on restart — full state recovery from the log
Checkpoint after flush — WAL is truncated after a successful Parquet flush

Layer 3: Periodic Parquet Flush

A background loop exports dirty tables to Parquet files with Iceberg v1 metadata every 30 seconds (configurable). These files are uploaded to object storage (S3, GCS, Azure Blob, or MinIO for local dev). This protects against node loss and disk failure — even if the entire machine is destroyed, data from the last flush is preserved in durable object storage.

FLUSH TABLE Command

You can trigger an immediate Parquet export without waiting for the periodic flush:

-- Flush a specific table
FLUSH TABLE customers;

-- Flush all dirty tables
FLUSH;

This is useful before maintenance windows or when you need to guarantee that specific data is in object storage before proceeding.

Dirty Tracking

HatiData tracks which tables have been modified since the last flush using an atomic counter per table. Only dirty tables are exported during periodic flush, minimizing I/O and storage writes.

The dirty tracker is a lock-free DashMap<String, AtomicUsize> — writes increment the counter, and the flusher resets it after a successful export.

Configuration

Variable	Default	Description
`HATIDATA_AUTO_PERSIST_ENABLED`	`true`	Enable periodic Parquet flush
`HATIDATA_FLUSH_INTERVAL_SECS`	`30`	Seconds between flush cycles
`HATIDATA_FLUSH_SIZE_THRESHOLD_MB`	`10`	Minimum dirty data size to trigger flush
`HATIDATA_WAL_ENABLED`	`true`	Enable write-ahead log
`HATIDATA_WAL_FSYNC`	`true`	fsync WAL on every write (disable only for testing)

warning

Disabling HATIDATA_WAL_FSYNC removes the crash-safety guarantee. Only disable this in test environments where performance matters more than durability.

Snapshot Isolation

HatiData uses an RwLock to prevent mid-query view updates during concurrent writes. When a query begins execution, it acquires a read lock on the current snapshot. Parquet flush and table loader operations acquire a write lock, ensuring that:

Running queries always see a consistent point-in-time view
Concurrent writes do not cause partial reads
Flush operations do not interfere with in-flight queries

This provides snapshot isolation without the overhead of MVCC — appropriate for the append-heavy, read-mostly workloads typical of agent data layers.

Recovery Sequence

On startup after a crash, HatiData follows this sequence:

Load DuckDB file — recover any data that was already persisted to the database file
Replay WAL — apply any mutations that were fsynced but not yet flushed to Parquet
Mark all tables dirty — ensure the next flush cycle exports everything
Run Parquet flush — immediately export to object storage for durability
Load from object storage — TableLoader discovers Parquet files and registers them as DuckDB views
Begin serving — accept new connections

The entire recovery sequence is automatic. No manual intervention is required.

Query Pipeline — Where durability fits in the 13-step execution pipeline
Two-Plane Model — Data plane architecture where durability operates
Configuration Reference — Full list of environment variables

Three-Layer Durability Model​

How It Works​

Layer 1: File-Backed DuckDB​

Layer 2: Write-Ahead Log​

Layer 3: Periodic Parquet Flush​

FLUSH TABLE Command​

Dirty Tracking​

Configuration​

Snapshot Isolation​

Recovery Sequence​

Related Concepts​

Stay in the loop