Skip to main content

Performance & Benchmarks

HatiData adds a thin governance and transpilation layer on top of DuckDB. This page documents the overhead introduced by that layer, end-to-end query latency targets, caching behavior, and configuration knobs that affect performance.

Transpilation Overhead

Every SQL query passes through the Snowflake-to-DuckDB transpiler before execution. The transpiler rewrites syntax, maps functions, and transforms types. For the vast majority of queries, this step is not measurable.

Query ComplexityExampleTranspilation Time (p50)Transpilation Time (p99)
Simple SELECTSELECT id, name FROM users WHERE active = true0.02 ms0.08 ms
AggregationSELECT dept, COUNT(*), AVG(salary) FROM employees GROUP BY dept0.04 ms0.12 ms
Multi-JOIN3-table join with subquery and window function0.08 ms0.25 ms
Complex analyticsTPC-DS style (nested CTEs, 5+ joins, CASE expressions)0.15 ms0.45 ms
Vector searchSELECT * FROM docs WHERE semantic_match(content, 'query', 10)0.20 ms0.60 ms

These numbers come from the Criterion benchmark suite (crates/hatidata-transpiler/benches/transpile_bench.rs) running on an Apple M2 with 16 GB RAM. Server-class hardware shows similar results because transpilation is CPU-bound and single-threaded per query.


Query Latency

End-to-end latency includes the full 15-step pipeline: semaphore acquisition, table extraction, policy check, cost estimation, quota check, row filtering, transpilation, vector embedding (if applicable), snapshot pinning, DuckDB execution, AI healing (if applicable), dirty tracking + WAL write, column masking, metering, and audit logging.

Latency Targets

MetricTargetMeasured (local, 1M rows)
p50< 5 ms2.1 ms
p95< 20 ms12.4 ms
p99< 50 ms31.2 ms
p99.9< 200 ms89.5 ms

These targets apply to analytical queries against warm data (already in DuckDB memory or NVMe cache). Cold queries that require Parquet reads from S3 add the object storage round-trip time (typically 20-80 ms per file, parallelized).

Pipeline Step Breakdown

Where time is spent on a typical warm query (p50):

StepTimeNotes
Semaphore + policy check0.1 msIn-memory policy evaluation, no network call
Table extract + cost estimate0.1 msAST walk, row count estimation
Transpilation0.05 msSnowflake to DuckDB rewrite
DuckDB execution1.5 msDepends entirely on query and data volume
WAL write + dirty tracking0.2 msfsync to NVMe, batched
Column masking + metering0.1 msIn-memory operations
Audit logging0.05 msAsync write, does not block response
Total~2.1 ms

DuckDB execution dominates for simple queries. For complex analytical queries, DuckDB execution can be 100 ms or more, but the governance overhead remains constant at under 1 ms.


TPC-DS Benchmark Results

HatiData passes all 99 TPC-DS queries at scale factor 1 (1 GB). The following results compare raw DuckDB execution against HatiData (which adds the governance pipeline on top of DuckDB).

MetricRaw DuckDBHatiDataOverhead
Total time (99 queries)4.2 s4.4 s+4.8%
Fastest query (q06)8 ms9 ms+1 ms
Slowest query (q72)340 ms345 ms+5 ms
Geometric mean28 ms29 ms+3.6%

The governance overhead is effectively constant per query (0.5-1 ms), so it becomes proportionally smaller on more expensive queries. At TPC-DS scale factor 10 (10 GB), the overhead drops below 2%.


Throughput & Concurrency

Concurrent Query Capacity

HatiData uses a semaphore to limit concurrent DuckDB query execution. The default limit depends on available CPU cores:

ConfigurationDefaultRange
HATIDATA_MAX_CONCURRENT_QUERIESNumber of CPU cores1 - 256

Queries that exceed the semaphore limit are queued (FIFO) and execute as slots become available. The queue depth is bounded -- if the queue exceeds 1000 pending queries, new queries receive HATI_ERR_429_CONCURRENT_LIMIT.

Throughput Benchmarks

Measured on an 8-core machine with 32 GB RAM, TPC-DS SF1 data:

WorkloadQueries/secNotes
Simple point lookups12,000SELECT * FROM t WHERE id = ?
Aggregation queries3,500SELECT dept, COUNT(*) FROM t GROUP BY dept
Mixed analytical (TPC-DS)800Random selection of TPC-DS queries
Write-heavy (INSERT)5,000Single-row inserts
Bulk load (COPY)150 MB/sParquet file import

MCP Server Throughput

The MCP server (JSON-RPC over HTTP) handles agent tool calls:

MetricValue
Concurrent connections10,000
Tool calls/sec (simple)8,000
Memory search (vector + SQL hybrid)2,000/sec
Trigger evaluation5,000 evals/sec

Caching Architecture

HatiData uses a three-tier cache to minimize repeated computation and storage reads.

L1: RAM Cache (moka)

  • What: Query results, transpiled SQL, policy bundles, schema metadata
  • Backed by: moka concurrent cache (lock-free)
  • Size: Configurable via HATIDATA_L1_CACHE_SIZE_MB (default: 512 MB)
  • TTL: 60 seconds for query results, 300 seconds for metadata
  • Hit rate: 60-80% for typical agent workloads (agents repeat similar queries)
  • Latency on hit: < 0.1 ms

L2: NVMe / Local Disk

  • What: DuckDB buffer pool, WAL files, recently flushed Parquet
  • Backed by: DuckDB's internal buffer manager + OS page cache
  • Size: Configurable via HATIDATA_DUCKDB_MEMORY_LIMIT_MB (default: 4096 MB for DuckDB buffer pool)
  • Hit rate: 90%+ for datasets that fit in the buffer pool
  • Latency on hit: 0.01-0.1 ms (memory-mapped)

L3: Object Storage (S3/GCS/MinIO)

  • What: Parquet snapshots, archived WAL segments, audit logs
  • Backed by: S3, GCS, Azure Blob, or MinIO (local dev)
  • Latency on read: 20-80 ms per file (parallelized across files)
  • When accessed: Cold start, data not in DuckDB buffer pool, explicit FLUSH TABLE

Cache Invalidation

  • L1 entries are evicted on TTL expiry or LRU when the cache is full.
  • L2 (DuckDB buffer) uses its own LRU eviction when HATIDATA_DUCKDB_MEMORY_LIMIT_MB is exceeded.
  • Writes invalidate the L1 cache entry for the affected table.
  • FLUSH TABLE <name> forces L2 data to L3 (Parquet export to object storage).

Memory Configuration

DuckDB is the primary consumer of memory. Two configuration variables control its allocation:

VariableDefaultDescription
HATIDATA_DUCKDB_MEMORY_LIMIT_MB4096Maximum memory DuckDB can use for its buffer pool and query execution.
HATIDATA_DUCKDB_THREADSNumber of CPU coresNumber of threads DuckDB uses for parallel query execution.
warning

Setting HATIDATA_DUCKDB_MEMORY_LIMIT_MB too low will cause complex queries to spill to disk, significantly increasing latency. For production workloads, allocate at least 2x the size of your hot dataset.

Per-Query Memory Budget

Individual queries are not given explicit memory budgets. DuckDB manages memory internally, spilling intermediate results to disk when the buffer pool is full. If a single query attempts to allocate more memory than is available, DuckDB will return an out-of-memory error (surfaced as HATI_ERR_500_DUCKDB_CRASH).


Performance Configuration Reference

VariableDefaultDescription
HATIDATA_MAX_CONCURRENT_QUERIESCPU coresMaximum concurrent DuckDB query executions
HATIDATA_DUCKDB_MEMORY_LIMIT_MB4096DuckDB buffer pool size in MB
HATIDATA_DUCKDB_THREADSCPU coresDuckDB parallel execution threads
HATIDATA_L1_CACHE_SIZE_MB512moka RAM cache size in MB
HATIDATA_L1_CACHE_TTL_SECS60L1 cache TTL for query results
HATIDATA_WAL_FSYNCtrueWhether WAL writes use fsync (disable for benchmarks only)
HATIDATA_PARQUET_FLUSH_INTERVAL_SECS30How often dirty tables are flushed to Parquet
HATIDATA_AUDIT_ASYNCtrueWhether audit logging is asynchronous

Comparison: HatiData vs Raw DuckDB vs Legacy Cloud Warehouses

For agent workloads -- high query frequency, small-to-medium data volumes, mixed reads and writes, need for governance and audit:

DimensionHatiDataRaw DuckDBLegacy Cloud Warehouses
Query latency (p50)2-5 ms1-3 ms200-2000 ms
Governance overhead< 1 msNone (no governance)Varies (30-100 ms)
Concurrent queriesSemaphore-controlledUnlimited (crash risk)Warehouse-size dependent
Agent memory (vector + SQL)Built-inManual (separate vector DB)Not available
Chain-of-thought auditBuilt-in, hash-chainedNot availableNot available
Semantic triggersBuilt-inNot availableNot available
Access control (RBAC/ABAC)Per-query enforcementNot availableRole-based only
Cost (small workload)$29/mo cloud, $0 local$0$400+/mo minimum
Postgres wire protocolYesNo (native API only)No (proprietary drivers)
TLS + encryption at restEnforcedManualYes

HatiData is purpose-built for agent workloads where governance, auditability, and low latency must coexist. Raw DuckDB is faster for uncontrolled local analytics. Legacy cloud warehouses are better suited for large-scale human-driven BI where sub-second latency is not required.

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.