Skip to main content

Concurrency Model

HatiData's proxy handles concurrent agent connections while managing query execution through its embedded columnar engine. This page describes how HatiData coordinates concurrency with a semaphore-based admission queue, isolated query execution, and a multi-tier cache that keeps the most expensive work from repeating.

Semaphore-Based Admission Control

Allowing hundreds of simultaneous queries would exhaust memory and degrade all queries rather than completing any of them quickly. HatiData gates query execution with a bounded semaphore.

Primary Semaphore

The primary semaphore limits total concurrent query execution. The default capacity is 64 concurrent queries. Queries that arrive when all 64 slots are occupied are queued rather than rejected.

Incoming queries --> Queue --> Semaphore (64 slots) --> Blocking pool --> Query engine
(FIFO (acquire slot; (one per slot) (executes)
by block if full)
priority)

The queue is priority-ordered rather than strictly FIFO. Agents with admin or analyst RBAC roles have their queries dequeued ahead of developer and auditor roles when the semaphore is contested. Within the same priority tier, the queue is FIFO.

Queue depth is bounded (default: 256). If the queue is full, new queries are rejected with a service_unavailable error. This bound prevents unbounded memory growth under extreme load — 256 queued connections plus 64 executing connections is the maximum in-flight state the proxy will hold.

Full-Scan Semaphore

Full-table scans — queries without selective WHERE predicates on indexed columns — consume significantly more memory than point queries. To prevent a burst of full-scan queries from pushing total memory usage beyond the engine's configured limit, a separate, smaller semaphore gates full-scan operations.

The full-scan semaphore has a default capacity of 8. A query that is classified as a full scan (detected during cost estimation in the pipeline's security stage) must acquire both the primary semaphore slot and a full-scan slot before executing.

The full-scan classification uses the same logic as cost estimation: a query is a full scan if it does not reference an indexed column in its WHERE clause for every table it reads. This classification is conservative — some queries classified as full scans may in practice read only a small fraction of a table if the query optimizer applies partition pruning.

Priority and Fairness

HatiData does not implement strict fairness across agents — a low-priority agent that has been waiting in the queue for a long time does not get promoted. This is intentional: HatiData's use case is agent workloads where latency variance on individual queries is acceptable, and complex priority aging schemes would add overhead to every queue operation.

Organizations that require strict fairness or dedicated concurrency lanes for specific agents should contact HatiData about Enterprise tier reserved concurrency allocation.

Multi-Tier Cache

The cache sits in front of the query engine. A query that hits the cache returns its result without acquiring a semaphore slot, without dispatching to the blocking pool, and without touching the query engine at all.

Query arrives
|
v
In-memory cache
Hit? --> return cached result (sub-millisecond)
Miss?
|
v
Local disk cache
Hit? --> deserialize + return (~1-5ms)
Miss?
|
v
Object storage cache (S3 / GCS / Azure Blob)
Hit? --> download + deserialize + return (~50-200ms)
Miss?
|
v
Query engine execution
|
v
Write result to all three tiers simultaneously

Each tier has different characteristics:

In-Memory Cache

The in-memory cache uses a concurrent, high-performance eviction policy that combines frequency-based admission with LRU eviction — frequently accessed entries are kept in memory even under high churn.

PropertyValue
CapacityConfigurable (default: 1,000 entries)
Max entry sizeConfigurable (default: 10 MB per entry)
TTLConfigurable (default: 300s)
TTI (time-to-idle)Configurable (default: 60s)
Thread safetyLock-free concurrent reads and writes

The in-memory cache is keyed on a hash of the normalized SQL plus the agent's organization ID and active RLS parameters. Two agents from the same organization with identical RLS parameters will share cache entries. Agents from different organizations never share cache entries — their organization ID is part of the cache key.

This is the fastest path — a cache hit returns results without any I/O and with only a hash lookup and a memory copy. Target latency for hits is under 1 millisecond.

Local Disk Cache

The local disk cache stores serialized result sets on high-performance local storage. Local storage is 10–100x faster than network storage and provides substantially larger capacity than memory at lower cost.

PropertyValue
CapacityConfigurable (default: 100 GB)
Serialization formatCompressed columnar format
EncryptionAES-256-GCM (inherits disk encryption)
TTLConfigurable (default: 3,600s)
EvictionLRU, enforced by background compaction task

Result sets are serialized to a compact columnar format before being written. This format deserializes quickly and preserves the columnar layout that makes downstream analytics efficient.

The local disk cache is particularly valuable for repeated analytical queries: a nightly reporting agent that runs the same aggregation query against yesterday's data will hit this tier on every execution after the first, paying only the deserialization cost rather than the full query engine execution cost.

Object Storage Cache

The object storage cache stores result sets in cloud object storage (S3, GCS, or Azure Blob depending on cloud provider). This tier is slower than local SSD but persists across proxy restarts and is accessible from multiple proxy instances in a horizontal scale-out deployment.

PropertyValue
CapacityEffectively unbounded (object storage limits apply)
Serialization formatCompressed Parquet
EncryptionProvider-managed (S3-SSE, GCS CMEK, Azure Storage encryption)
TTLConfigurable (default: 24h)
EvictionLifecycle policies on the storage bucket

The object storage tier is populated by a background task that promotes local SSD entries exceeding a configurable age and size threshold. Small, short-lived query results are not promoted — the object storage API overhead would exceed the benefit.

This tier is most valuable in two scenarios:

  1. Post-restart warm-up: After a proxy restart, the in-memory and local SSD caches are cold. Object storage provides a warm cache for queries that were common before the restart, avoiding a full cold-start performance regression.
  2. Horizontal scale-out: Multiple proxy instances sharing the same object storage bucket can read each other's cached results. This allows read-heavy agent workloads to scale across multiple proxy instances without each instance independently re-executing the same queries.

Cache Invalidation

Cache entries are invalidated when the underlying data changes. HatiData uses snapshot-based invalidation: each cache entry is tagged with the snapshot version of the tables it read. When a write commits to a table, the snapshot version increments. On the next query against that table, the cache layer checks whether the entry's snapshot version matches the current version; stale entries are evicted.

This approach avoids the classic cache invalidation problem for read-heavy analytical workloads: most agent queries read historical or aggregated data that changes infrequently. Snapshot versioning allows the cache to be highly effective for read-only workloads while correctly invalidating on write.

For tables that are written frequently (event tables, streaming ingestion), the cache TTL effectively determines the staleness bound. The in-memory cache TTL can be set to zero to disable caching for latency-sensitive, high-write tables.

Transpilation Cache

Separate from the result cache, HatiData maintains a transpilation cache keyed on the hash of the normalized SQL statement. Successful transpilation results are stored in this cache.

Transpilation is CPU-bound (AST parsing + tree rewriting) and typically takes 1-5 milliseconds for complex queries. For an agent that issues the same parameterized query repeatedly — for example, a monitoring agent that checks a metric table with the same query structure but different timestamp bounds — the transpilation cache eliminates this overhead after the first execution.

The transpilation cache is bounded (default: 10,000 entries) with LRU eviction. It is per-process (not shared across instances or persisted across restarts) because transpilation results are deterministic — if the cache is cold after a restart, the first execution of each unique query re-populates it at no correctness cost.

Session Lifecycle and Suspend/Resume

HatiData proxy manages query engine connection state as a session:

  1. Session start: A new connection is initialized. The session semaphore (if using session-scoped concurrency) is acquired.
  2. Active: Queries execute through the multi-stage pipeline. The auto-suspend timer resets on each query completion.
  3. Idle: The auto-suspend timer counts down (default: 5 seconds). The in-memory cache remains warm.
  4. Suspend: In-process state is serialized to local disk cache. Memory is released.
  5. Resume: On the next query, state is deserialized from local disk and the engine resumes from the serialized checkpoint. Queries execute normally.

The suspend/resume cycle is transparent to the agent — there is no protocol-level indication that a suspend occurred. Resume latency is typically 10–50 milliseconds for small to medium session states.

  • Query Pipeline — The stages where concurrency controls are applied
  • Two-Plane Model — The data plane where all concurrency controls run
  • Cost Model — How auto-suspend interacts with per-second billing
  • Branch Isolation — How branching interacts with connection state and caching

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.