Performance & Benchmarks

HatiData adds a thin governance and transpilation layer on top of DuckDB. This page documents the overhead introduced by that layer, end-to-end query latency targets, caching behavior, and configuration knobs that affect performance.

Transpilation Overhead

Every SQL query passes through the Snowflake-to-DuckDB transpiler before execution. The transpiler rewrites syntax, maps functions, and transforms types. For the vast majority of queries, this step is not measurable.

Query Complexity	Example	Transpilation Time (p50)	Transpilation Time (p99)
Simple SELECT	`SELECT id, name FROM users WHERE active = true`	0.02 ms	0.08 ms
Aggregation	`SELECT dept, COUNT(*), AVG(salary) FROM employees GROUP BY dept`	0.04 ms	0.12 ms
Multi-JOIN	3-table join with subquery and window function	0.08 ms	0.25 ms
Complex analytics	TPC-DS style (nested CTEs, 5+ joins, CASE expressions)	0.15 ms	0.45 ms
Vector search	`SELECT * FROM docs WHERE semantic_match(content, 'query', 10)`	0.20 ms	0.60 ms

These numbers come from the Criterion benchmark suite (crates/hatidata-transpiler/benches/transpile_bench.rs) running on an Apple M2 with 16 GB RAM. Server-class hardware shows similar results because transpilation is CPU-bound and single-threaded per query.

Query Latency

End-to-end latency includes the full 15-step pipeline: semaphore acquisition, table extraction, policy check, cost estimation, quota check, row filtering, transpilation, vector embedding (if applicable), snapshot pinning, DuckDB execution, AI healing (if applicable), dirty tracking + WAL write, column masking, metering, and audit logging.

Latency Targets

Metric	Target	Measured (local, 1M rows)
p50	< 5 ms	2.1 ms
p95	< 20 ms	12.4 ms
p99	< 50 ms	31.2 ms
p99.9	< 200 ms	89.5 ms

These targets apply to analytical queries against warm data (already in DuckDB memory or NVMe cache). Cold queries that require Parquet reads from S3 add the object storage round-trip time (typically 20-80 ms per file, parallelized).

Pipeline Step Breakdown

Where time is spent on a typical warm query (p50):

Step	Time	Notes
Semaphore + policy check	0.1 ms	In-memory policy evaluation, no network call
Table extract + cost estimate	0.1 ms	AST walk, row count estimation
Transpilation	0.05 ms	Snowflake to DuckDB rewrite
DuckDB execution	1.5 ms	Depends entirely on query and data volume
WAL write + dirty tracking	0.2 ms	fsync to NVMe, batched
Column masking + metering	0.1 ms	In-memory operations
Audit logging	0.05 ms	Async write, does not block response
Total	~2.1 ms

DuckDB execution dominates for simple queries. For complex analytical queries, DuckDB execution can be 100 ms or more, but the governance overhead remains constant at under 1 ms.

TPC-DS Benchmark Results

HatiData passes all 99 TPC-DS queries at scale factor 1 (1 GB). The following results compare raw DuckDB execution against HatiData (which adds the governance pipeline on top of DuckDB).

Metric	Raw DuckDB	HatiData	Overhead
Total time (99 queries)	4.2 s	4.4 s	+4.8%
Fastest query (q06)	8 ms	9 ms	+1 ms
Slowest query (q72)	340 ms	345 ms	+5 ms
Geometric mean	28 ms	29 ms	+3.6%

The governance overhead is effectively constant per query (0.5-1 ms), so it becomes proportionally smaller on more expensive queries. At TPC-DS scale factor 10 (10 GB), the overhead drops below 2%.

Throughput & Concurrency

Concurrent Query Capacity

HatiData uses a semaphore to limit concurrent DuckDB query execution. The default limit depends on available CPU cores:

Configuration	Default	Range
`HATIDATA_MAX_CONCURRENT_QUERIES`	Number of CPU cores	1 - 256

Queries that exceed the semaphore limit are queued (FIFO) and execute as slots become available. The queue depth is bounded -- if the queue exceeds 1000 pending queries, new queries receive HATI_ERR_429_CONCURRENT_LIMIT.

Throughput Benchmarks

Measured on an 8-core machine with 32 GB RAM, TPC-DS SF1 data:

Workload	Queries/sec	Notes
Simple point lookups	12,000	`SELECT * FROM t WHERE id = ?`
Aggregation queries	3,500	`SELECT dept, COUNT(*) FROM t GROUP BY dept`
Mixed analytical (TPC-DS)	800	Random selection of TPC-DS queries
Write-heavy (INSERT)	5,000	Single-row inserts
Bulk load (COPY)	150 MB/s	Parquet file import

MCP Server Throughput

The MCP server (JSON-RPC over HTTP) handles agent tool calls:

Metric	Value
Concurrent connections	10,000
Tool calls/sec (simple)	8,000
Memory search (vector + SQL hybrid)	2,000/sec
Trigger evaluation	5,000 evals/sec

Caching Architecture

HatiData uses a three-tier cache to minimize repeated computation and storage reads.

L1: RAM Cache (moka)

What: Query results, transpiled SQL, policy bundles, schema metadata
Backed by: moka concurrent cache (lock-free)
Size: Configurable via HATIDATA_L1_CACHE_SIZE_MB (default: 512 MB)
TTL: 60 seconds for query results, 300 seconds for metadata
Hit rate: 60-80% for typical agent workloads (agents repeat similar queries)
Latency on hit: < 0.1 ms

L2: NVMe / Local Disk

What: DuckDB buffer pool, WAL files, recently flushed Parquet
Backed by: DuckDB's internal buffer manager + OS page cache
Size: Configurable via HATIDATA_DUCKDB_MEMORY_LIMIT_MB (default: 4096 MB for DuckDB buffer pool)
Hit rate: 90%+ for datasets that fit in the buffer pool
Latency on hit: 0.01-0.1 ms (memory-mapped)

L3: Object Storage (S3/GCS/MinIO)

What: Parquet snapshots, archived WAL segments, audit logs
Backed by: S3, GCS, Azure Blob, or MinIO (local dev)
Latency on read: 20-80 ms per file (parallelized across files)
When accessed: Cold start, data not in DuckDB buffer pool, explicit FLUSH TABLE

Cache Invalidation

L1 entries are evicted on TTL expiry or LRU when the cache is full.
L2 (DuckDB buffer) uses its own LRU eviction when HATIDATA_DUCKDB_MEMORY_LIMIT_MB is exceeded.
Writes invalidate the L1 cache entry for the affected table.
FLUSH TABLE <name> forces L2 data to L3 (Parquet export to object storage).

Memory Configuration

DuckDB is the primary consumer of memory. Two configuration variables control its allocation:

Variable	Default	Description
`HATIDATA_DUCKDB_MEMORY_LIMIT_MB`	4096	Maximum memory DuckDB can use for its buffer pool and query execution.
`HATIDATA_DUCKDB_THREADS`	Number of CPU cores	Number of threads DuckDB uses for parallel query execution.

warning

Setting HATIDATA_DUCKDB_MEMORY_LIMIT_MB too low will cause complex queries to spill to disk, significantly increasing latency. For production workloads, allocate at least 2x the size of your hot dataset.

Per-Query Memory Budget

Individual queries are not given explicit memory budgets. DuckDB manages memory internally, spilling intermediate results to disk when the buffer pool is full. If a single query attempts to allocate more memory than is available, DuckDB will return an out-of-memory error (surfaced as HATI_ERR_500_DUCKDB_CRASH).

Performance Configuration Reference

Variable	Default	Description
`HATIDATA_MAX_CONCURRENT_QUERIES`	CPU cores	Maximum concurrent DuckDB query executions
`HATIDATA_DUCKDB_MEMORY_LIMIT_MB`	4096	DuckDB buffer pool size in MB
`HATIDATA_DUCKDB_THREADS`	CPU cores	DuckDB parallel execution threads
`HATIDATA_L1_CACHE_SIZE_MB`	512	moka RAM cache size in MB
`HATIDATA_L1_CACHE_TTL_SECS`	60	L1 cache TTL for query results
`HATIDATA_WAL_FSYNC`	true	Whether WAL writes use fsync (disable for benchmarks only)
`HATIDATA_PARQUET_FLUSH_INTERVAL_SECS`	30	How often dirty tables are flushed to Parquet
`HATIDATA_AUDIT_ASYNC`	true	Whether audit logging is asynchronous

Comparison: HatiData vs Raw DuckDB vs Legacy Cloud Warehouses

For agent workloads -- high query frequency, small-to-medium data volumes, mixed reads and writes, need for governance and audit:

Dimension	HatiData	Raw DuckDB	Legacy Cloud Warehouses
Query latency (p50)	2-5 ms	1-3 ms	200-2000 ms
Governance overhead	< 1 ms	None (no governance)	Varies (30-100 ms)
Concurrent queries	Semaphore-controlled	Unlimited (crash risk)	Warehouse-size dependent
Agent memory (vector + SQL)	Built-in	Manual (separate vector DB)	Not available
Chain-of-thought audit	Built-in, hash-chained	Not available	Not available
Semantic triggers	Built-in	Not available	Not available
Access control (RBAC/ABAC)	Per-query enforcement	Not available	Role-based only
Cost (small workload)	$29/mo cloud, $0 local	$0	$400+/mo minimum
Postgres wire protocol	Yes	No (native API only)	No (proprietary drivers)
TLS + encryption at rest	Enforced	Manual	Yes

HatiData is purpose-built for agent workloads where governance, auditability, and low latency must coexist. Raw DuckDB is faster for uncontrolled local analytics. Legacy cloud warehouses are better suited for large-scale human-driven BI where sub-second latency is not required.

Transpilation Overhead​

Query Latency​

Latency Targets​

Pipeline Step Breakdown​

TPC-DS Benchmark Results​

Throughput & Concurrency​

Concurrent Query Capacity​

Throughput Benchmarks​

MCP Server Throughput​

Caching Architecture​

L1: RAM Cache (moka)​

L2: NVMe / Local Disk​

L3: Object Storage (S3/GCS/MinIO)​

Cache Invalidation​

Memory Configuration​

Per-Query Memory Budget​

Performance Configuration Reference​

Comparison: HatiData vs Raw DuckDB vs Legacy Cloud Warehouses​

Stay in the loop