Performance & Benchmarks
HatiData adds a thin governance and transpilation layer on top of DuckDB. This page documents the overhead introduced by that layer, end-to-end query latency targets, caching behavior, and configuration knobs that affect performance.
Transpilation Overhead
Every SQL query passes through the Snowflake-to-DuckDB transpiler before execution. The transpiler rewrites syntax, maps functions, and transforms types. For the vast majority of queries, this step is not measurable.
| Query Complexity | Example | Transpilation Time (p50) | Transpilation Time (p99) |
|---|---|---|---|
| Simple SELECT | SELECT id, name FROM users WHERE active = true | 0.02 ms | 0.08 ms |
| Aggregation | SELECT dept, COUNT(*), AVG(salary) FROM employees GROUP BY dept | 0.04 ms | 0.12 ms |
| Multi-JOIN | 3-table join with subquery and window function | 0.08 ms | 0.25 ms |
| Complex analytics | TPC-DS style (nested CTEs, 5+ joins, CASE expressions) | 0.15 ms | 0.45 ms |
| Vector search | SELECT * FROM docs WHERE semantic_match(content, 'query', 10) | 0.20 ms | 0.60 ms |
These numbers come from the Criterion benchmark suite (crates/hatidata-transpiler/benches/transpile_bench.rs) running on an Apple M2 with 16 GB RAM. Server-class hardware shows similar results because transpilation is CPU-bound and single-threaded per query.
Query Latency
End-to-end latency includes the full 15-step pipeline: semaphore acquisition, table extraction, policy check, cost estimation, quota check, row filtering, transpilation, vector embedding (if applicable), snapshot pinning, DuckDB execution, AI healing (if applicable), dirty tracking + WAL write, column masking, metering, and audit logging.
Latency Targets
| Metric | Target | Measured (local, 1M rows) |
|---|---|---|
| p50 | < 5 ms | 2.1 ms |
| p95 | < 20 ms | 12.4 ms |
| p99 | < 50 ms | 31.2 ms |
| p99.9 | < 200 ms | 89.5 ms |
These targets apply to analytical queries against warm data (already in DuckDB memory or NVMe cache). Cold queries that require Parquet reads from S3 add the object storage round-trip time (typically 20-80 ms per file, parallelized).
Pipeline Step Breakdown
Where time is spent on a typical warm query (p50):
| Step | Time | Notes |
|---|---|---|
| Semaphore + policy check | 0.1 ms | In-memory policy evaluation, no network call |
| Table extract + cost estimate | 0.1 ms | AST walk, row count estimation |
| Transpilation | 0.05 ms | Snowflake to DuckDB rewrite |
| DuckDB execution | 1.5 ms | Depends entirely on query and data volume |
| WAL write + dirty tracking | 0.2 ms | fsync to NVMe, batched |
| Column masking + metering | 0.1 ms | In-memory operations |
| Audit logging | 0.05 ms | Async write, does not block response |
| Total | ~2.1 ms |
DuckDB execution dominates for simple queries. For complex analytical queries, DuckDB execution can be 100 ms or more, but the governance overhead remains constant at under 1 ms.
TPC-DS Benchmark Results
HatiData passes all 99 TPC-DS queries at scale factor 1 (1 GB). The following results compare raw DuckDB execution against HatiData (which adds the governance pipeline on top of DuckDB).
| Metric | Raw DuckDB | HatiData | Overhead |
|---|---|---|---|
| Total time (99 queries) | 4.2 s | 4.4 s | +4.8% |
| Fastest query (q06) | 8 ms | 9 ms | +1 ms |
| Slowest query (q72) | 340 ms | 345 ms | +5 ms |
| Geometric mean | 28 ms | 29 ms | +3.6% |
The governance overhead is effectively constant per query (0.5-1 ms), so it becomes proportionally smaller on more expensive queries. At TPC-DS scale factor 10 (10 GB), the overhead drops below 2%.
Throughput & Concurrency
Concurrent Query Capacity
HatiData uses a semaphore to limit concurrent DuckDB query execution. The default limit depends on available CPU cores:
| Configuration | Default | Range |
|---|---|---|
HATIDATA_MAX_CONCURRENT_QUERIES | Number of CPU cores | 1 - 256 |
Queries that exceed the semaphore limit are queued (FIFO) and execute as slots become available. The queue depth is bounded -- if the queue exceeds 1000 pending queries, new queries receive HATI_ERR_429_CONCURRENT_LIMIT.
Throughput Benchmarks
Measured on an 8-core machine with 32 GB RAM, TPC-DS SF1 data:
| Workload | Queries/sec | Notes |
|---|---|---|
| Simple point lookups | 12,000 | SELECT * FROM t WHERE id = ? |
| Aggregation queries | 3,500 | SELECT dept, COUNT(*) FROM t GROUP BY dept |
| Mixed analytical (TPC-DS) | 800 | Random selection of TPC-DS queries |
| Write-heavy (INSERT) | 5,000 | Single-row inserts |
| Bulk load (COPY) | 150 MB/s | Parquet file import |
MCP Server Throughput
The MCP server (JSON-RPC over HTTP) handles agent tool calls:
| Metric | Value |
|---|---|
| Concurrent connections | 10,000 |
| Tool calls/sec (simple) | 8,000 |
| Memory search (vector + SQL hybrid) | 2,000/sec |
| Trigger evaluation | 5,000 evals/sec |
Caching Architecture
HatiData uses a three-tier cache to minimize repeated computation and storage reads.
L1: RAM Cache (moka)
- What: Query results, transpiled SQL, policy bundles, schema metadata
- Backed by: moka concurrent cache (lock-free)
- Size: Configurable via
HATIDATA_L1_CACHE_SIZE_MB(default: 512 MB) - TTL: 60 seconds for query results, 300 seconds for metadata
- Hit rate: 60-80% for typical agent workloads (agents repeat similar queries)
- Latency on hit: < 0.1 ms
L2: NVMe / Local Disk
- What: DuckDB buffer pool, WAL files, recently flushed Parquet
- Backed by: DuckDB's internal buffer manager + OS page cache
- Size: Configurable via
HATIDATA_DUCKDB_MEMORY_LIMIT_MB(default: 4096 MB for DuckDB buffer pool) - Hit rate: 90%+ for datasets that fit in the buffer pool
- Latency on hit: 0.01-0.1 ms (memory-mapped)
L3: Object Storage (S3/GCS/MinIO)
- What: Parquet snapshots, archived WAL segments, audit logs
- Backed by: S3, GCS, Azure Blob, or MinIO (local dev)
- Latency on read: 20-80 ms per file (parallelized across files)
- When accessed: Cold start, data not in DuckDB buffer pool, explicit
FLUSH TABLE
Cache Invalidation
- L1 entries are evicted on TTL expiry or LRU when the cache is full.
- L2 (DuckDB buffer) uses its own LRU eviction when
HATIDATA_DUCKDB_MEMORY_LIMIT_MBis exceeded. - Writes invalidate the L1 cache entry for the affected table.
FLUSH TABLE <name>forces L2 data to L3 (Parquet export to object storage).
Memory Configuration
DuckDB is the primary consumer of memory. Two configuration variables control its allocation:
| Variable | Default | Description |
|---|---|---|
HATIDATA_DUCKDB_MEMORY_LIMIT_MB | 4096 | Maximum memory DuckDB can use for its buffer pool and query execution. |
HATIDATA_DUCKDB_THREADS | Number of CPU cores | Number of threads DuckDB uses for parallel query execution. |
Setting HATIDATA_DUCKDB_MEMORY_LIMIT_MB too low will cause complex queries to spill to disk, significantly increasing latency. For production workloads, allocate at least 2x the size of your hot dataset.
Per-Query Memory Budget
Individual queries are not given explicit memory budgets. DuckDB manages memory internally, spilling intermediate results to disk when the buffer pool is full. If a single query attempts to allocate more memory than is available, DuckDB will return an out-of-memory error (surfaced as HATI_ERR_500_DUCKDB_CRASH).
Performance Configuration Reference
| Variable | Default | Description |
|---|---|---|
HATIDATA_MAX_CONCURRENT_QUERIES | CPU cores | Maximum concurrent DuckDB query executions |
HATIDATA_DUCKDB_MEMORY_LIMIT_MB | 4096 | DuckDB buffer pool size in MB |
HATIDATA_DUCKDB_THREADS | CPU cores | DuckDB parallel execution threads |
HATIDATA_L1_CACHE_SIZE_MB | 512 | moka RAM cache size in MB |
HATIDATA_L1_CACHE_TTL_SECS | 60 | L1 cache TTL for query results |
HATIDATA_WAL_FSYNC | true | Whether WAL writes use fsync (disable for benchmarks only) |
HATIDATA_PARQUET_FLUSH_INTERVAL_SECS | 30 | How often dirty tables are flushed to Parquet |
HATIDATA_AUDIT_ASYNC | true | Whether audit logging is asynchronous |
Comparison: HatiData vs Raw DuckDB vs Legacy Cloud Warehouses
For agent workloads -- high query frequency, small-to-medium data volumes, mixed reads and writes, need for governance and audit:
| Dimension | HatiData | Raw DuckDB | Legacy Cloud Warehouses |
|---|---|---|---|
| Query latency (p50) | 2-5 ms | 1-3 ms | 200-2000 ms |
| Governance overhead | < 1 ms | None (no governance) | Varies (30-100 ms) |
| Concurrent queries | Semaphore-controlled | Unlimited (crash risk) | Warehouse-size dependent |
| Agent memory (vector + SQL) | Built-in | Manual (separate vector DB) | Not available |
| Chain-of-thought audit | Built-in, hash-chained | Not available | Not available |
| Semantic triggers | Built-in | Not available | Not available |
| Access control (RBAC/ABAC) | Per-query enforcement | Not available | Role-based only |
| Cost (small workload) | $29/mo cloud, $0 local | $0 | $400+/mo minimum |
| Postgres wire protocol | Yes | No (native API only) | No (proprietary drivers) |
| TLS + encryption at rest | Enforced | Manual | Yes |
HatiData is purpose-built for agent workloads where governance, auditability, and low latency must coexist. Raw DuckDB is faster for uncontrolled local analytics. Legacy cloud warehouses are better suited for large-scale human-driven BI where sub-second latency is not required.