Configuration Reference
All configuration is via environment variables prefixed with HATIDATA_. For Docker deployments, set these in your docker-compose.yml or .env file. For Kubernetes, use ConfigMaps or Secrets.
Server & Network
Core listener and transport settings for the proxy.
| Variable | Default | Description |
|---|---|---|
HATIDATA_LISTEN_ADDR | 0.0.0.0:5439 | Postgres wire protocol listener address and port. |
HATIDATA_TLS_ENABLED | false | Enable TLS termination on the proxy listener. |
HATIDATA_TLS_CERT_PATH | /etc/hatidata/tls/cert.pem | Path to the TLS certificate file (PEM format). |
HATIDATA_TLS_KEY_PATH | /etc/hatidata/tls/key.pem | Path to the TLS private key file (PEM format). |
HATIDATA_MAX_CONNECTIONS | 500 | Maximum number of concurrent TCP connections the proxy will accept. |
Authentication
Identity, JWT verification, and control plane connectivity.
| Variable | Default | Description |
|---|---|---|
HATIDATA_JWT_SECRET | hatidata-dev-secret-change-me | HMAC secret used to sign and verify JWT tokens. Must be changed in production. |
HATIDATA_CONTROL_PLANE_URL | http://localhost:8080 | Base URL of the HatiData control plane API. |
HATIDATA_CONTROL_PLANE_API_KEY | hd_internal_dev | API key used to authenticate requests to the control plane. |
HATIDATA_ORG_ID | "" | Organization ID this proxy instance belongs to. |
HATIDATA_ENV_ID | "" | Environment ID for multi-environment deployments. |
Query Engine
Tuning parameters for the embedded columnar query engine.
| Variable | Default | Description |
|---|---|---|
HATIDATA_DUCKDB_MEMORY_LIMIT_MB | 24000 | Engine heap memory cap in megabytes (24 GB default). |
HATIDATA_DUCKDB_THREADS | 0 (auto-detect) | Number of engine worker threads. 0 uses all available CPU cores. |
HATIDATA_TEMP_DIRECTORY | /tmp/hatidata | Scratch directory for spill-to-disk during large sorts and joins. |
HATIDATA_DUCKDB_PATH | "" (in-memory) | Path to a persistent database file. Empty uses an in-memory database. |
HATIDATA_QUERY_TIMEOUT_SECS | 300 | Per-query timeout in seconds. Queries exceeding this are cancelled. |
HATIDATA_MAX_CONCURRENT_QUERIES | 100 | Maximum number of queries executing concurrently across all connections. |
Cloud Storage
Object storage, encryption, and audit log configuration for multi-cloud deployments.
| Variable | Default | Description |
|---|---|---|
HATIDATA_CLOUD_PROVIDER | local | Cloud provider backend: local, aws, gcp, or azure. |
HATIDATA_STORAGE_BUCKET | customer-data-lake | Object storage bucket name for customer data. |
HATIDATA_STORAGE_PREFIX | data/ | Key prefix for objects within the storage bucket. |
HATIDATA_CLOUD_REGION | ap-southeast-1 | Cloud region for storage and compute resources. |
HATIDATA_STORAGE_ENDPOINT | "" | Custom storage endpoint URL (e.g., for S3-compatible storage or localstack). |
HATIDATA_STORAGE_FORCE_PATH_STYLE | false | Use S3 path-style access instead of virtual-hosted-style. Required for S3-compatible storage. |
HATIDATA_KMS_PROVIDER | local | Key management provider: local, aws, gcp, or azure. |
HATIDATA_LOCAL_ENCRYPTION_KEY | "" | Encryption key for the local KMS provider. |
HATIDATA_AUDIT_BUCKET | hatidata-audit | Object storage bucket for audit log archives. |
Caching
Multi-tier caching configuration: in-memory, Redis, local disk, and transpilation cache.
| Variable | Default | Description |
|---|---|---|
HATIDATA_CACHE_RAM_MAX_ENTRIES | 10000 | Maximum entries in the in-memory cache. |
HATIDATA_CACHE_RAM_TTL_SECS | 300 | Time-to-live in seconds for in-memory cache entries. |
HATIDATA_NVME_CACHE_PATH | /var/cache/hatidata | Directory path for the local disk cache. |
HATIDATA_NVME_CACHE_MAX_GB | 700 | Maximum size of the local disk cache in gigabytes. |
HATIDATA_REDIS_ENABLED | false | Enable the Redis-backed cache layer. |
HATIDATA_REDIS_URL | redis://localhost:6379 | Redis connection URL. |
HATIDATA_REDIS_TTL_SECS | 300 | Time-to-live in seconds for Redis cache entries. |
HATIDATA_REDIS_PREFIX | hatidata | Key prefix for all Redis entries (supports multi-tenant isolation). |
HATIDATA_TRANSPILE_CACHE_MAX_ENTRIES | 10000 | Maximum entries in the SQL transpilation result cache. |
HATIDATA_TRANSPILE_CACHE_TTL_SECS | 3600 | Time-to-live in seconds for cached transpilation results. |
HATIDATA_CACHE_WARMUP_ENABLED | false | Pre-load popular queries into the cache on proxy startup. |
MCP Server
Model Context Protocol server settings for AI agent connectivity.
| Variable | Default | Description |
|---|---|---|
HATIDATA_MCP_ENABLED | true | Enable the MCP server alongside the Postgres wire protocol listener. |
HATIDATA_MCP_LISTEN_ADDR | 0.0.0.0:5440 | MCP HTTP listener address and port. |
HATIDATA_MCP_TLS_ENABLED | false | Enable TLS on the MCP listener. |
HATIDATA_MCP_MAX_SESSIONS | 1000 | Maximum number of concurrent MCP sessions. |
HATIDATA_MCP_RATE_LIMIT_PER_SECOND | 50.0 | Sustained request rate limit per MCP session. |
HATIDATA_MCP_RATE_BURST_SIZE | 100 | Maximum burst size for MCP rate limiting. |
Policy Sync
Controls how the proxy fetches and applies governance policies from the control plane.
| Variable | Default | Description |
|---|---|---|
HATIDATA_POLICY_SYNC_ENABLED | true | Enable periodic policy synchronization from the control plane. |
HATIDATA_POLICY_SYNC_INTERVAL_SECS | 30 | Interval in seconds between policy refresh cycles. |
HATIDATA_POLICY_SYNC_STARTUP_TIMEOUT_SECS | 30 | Maximum seconds to wait for the initial policy sync on startup. |
HATIDATA_POLICY_SYNC_ALLOW_DEGRADED | false | INSECURE: If true, the proxy will serve queries with empty policies when the initial sync times out. Not recommended for production. |
Agent Engine
Settings for agent-native features including long-term memory, chain-of-thought, and hot context.
| Variable | Default | Description |
|---|---|---|
HATIDATA_AGENT_ENGINE_ENABLED | true | Enable the agent engine (MCP tools, hot context, RAG search). |
HATIDATA_AGENT_MAX_CONCURRENT_PER_AGENT | 10 | Maximum concurrent operations per individual agent. |
HATIDATA_AGENT_MAX_CONCURRENT_GLOBAL | 100 | Maximum concurrent agent operations across all agents. |
HATIDATA_AGENT_RATE_LIMIT_PER_SECOND | 50.0 | Sustained request rate limit for agent operations. |
HATIDATA_AGENT_RATE_BURST_SIZE | 100 | Maximum burst size for agent rate limiting. |
HATIDATA_AGENT_HOT_TABLES | "" | Comma-separated list of table names to pin in the agent hot context cache. |
Auto-Suspend
Query engine lifecycle management for cost optimization.
| Variable | Default | Description |
|---|---|---|
HATIDATA_AUTO_SUSPEND_ENABLED | true | Enable automatic suspension of the query engine after idle periods. |
HATIDATA_AUTO_SUSPEND_IDLE_SECS | 5 | Seconds of inactivity before the query engine is suspended. |
HATIDATA_AUTO_SUSPEND_RESUME_TIMEOUT_SECS | 30 | Maximum seconds allowed for the query engine to resume from a suspended state. |
Embedding & Vector Search
Configuration for embedding providers and the vector database used by agent memory and semantic triggers.
| Variable | Default | Description |
|---|---|---|
HATIDATA_EMBEDDING_PROVIDER | mock | Embedding provider: mock, sidecar, openai, voyage, together-embed, or cohere. |
HATIDATA_EMBEDDING_SIDECAR_URL | http://localhost:8090 | URL of the local embedding service. |
HATIDATA_OPENAI_API_KEY | "" | OpenAI API key for the openai embedding provider. |
HATIDATA_OPENAI_EMBEDDING_MODEL | text-embedding-3-small | OpenAI embedding model name. |
HATIDATA_EMBEDDING_BASE_URL | "" | Custom base URL for OpenAI-compatible embedding APIs. |
HATIDATA_COHERE_API_KEY | "" | Cohere API key for the cohere embedding provider. |
HATIDATA_COHERE_EMBEDDING_MODEL | embed-v4.0 | Cohere embedding model name. |
HATIDATA_QDRANT_URL | http://localhost:6334 | Vector database gRPC endpoint. |
HATIDATA_QDRANT_API_KEY | "" | API key for vector database authentication. |
HATIDATA_QDRANT_COLLECTION_PREFIX | hatidata | Prefix for vector collection names (supports multi-tenant isolation). |
AI Healing
Automatic query repair when transpilation or execution fails.
| Variable | Default | Description |
|---|---|---|
HATIDATA_TRANSPILATION_ENABLED | true | Enable Snowflake SQL auto-transpilation. |
HATIDATA_HEALER_API_ENDPOINT | "" | External AI healer API endpoint. Empty disables AI healing. |
HATIDATA_HEALER_MAX_RETRIES | 3 | Maximum retry attempts when the healer API returns an error. |
HATIDATA_HEALER_PII_REDACTION | true | Redact literal values from SQL before sending to the external healer API. |
Cost & Billing
Credit metering, cost estimation, and budget enforcement.
| Variable | Default | Description |
|---|---|---|
HATIDATA_COST_ESTIMATION_ENABLED | false | Enable enhanced cost estimation for query planning. |
HATIDATA_COST_PER_CREDIT | 0.40 | Dollar cost per compute credit for billing calculations. |
HATIDATA_HARD_CAP_ENABLED | true | Hard-block queries that would exceed the organization's credit budget. |
HATIDATA_METERING_FLUSH_INTERVAL_SECS | 60 | Interval in seconds for flushing accumulated metering data to the control plane. |
HATIDATA_AGENT_ACTIVITY_FLUSH_INTERVAL_SECS | 10 | Interval in seconds for flushing agent activity metrics to the control plane. |
Observability
Logging, metrics, tracing, and SIEM integration.
| Variable | Default | Description |
|---|---|---|
HATIDATA_LOG_LEVEL | info | Log verbosity level: trace, debug, info, warn, or error. |
HATIDATA_METRICS_PORT | 9090 | Port for the Prometheus-compatible metrics endpoint. |
HATIDATA_OTEL_ENABLED | false | Enable OpenTelemetry distributed tracing. |
HATIDATA_OTEL_ENDPOINT | http://localhost:4317 | OpenTelemetry Collector gRPC endpoint. |
HATIDATA_OTEL_SERVICE_NAME | hatidata-proxy | Service name reported in OpenTelemetry spans. |
HATIDATA_SIEM_ENABLED | false | Enable forwarding of security events to a SIEM collector. |
HATIDATA_SIEM_TARGET_TYPE | webhook | SIEM target type: webhook, splunk_hec, or datadog. |
HATIDATA_SIEM_ENDPOINT_URL | "" | SIEM collector endpoint URL. |
Security & Governance
Runtime security controls, tenant isolation, and data governance features.
| Variable | Default | Description |
|---|---|---|
HATIDATA_RESOURCE_GOVERNOR_ENABLED | false | Enable per-organization compute resource isolation. |
HATIDATA_TIER_GATE_ENABLED | false | Enable per-organization feature enforcement based on subscription tier. |
HATIDATA_TENANT_ISOLATION_ENABLED | false | Prevent cross-tenant JOINs at the query level. |
HATIDATA_SNAPSHOT_PINNING_ENABLED | true | Enable snapshot pinning for consistent point-in-time reads. |
HATIDATA_RESULT_STREAMING_ENABLED | false | Stream query results in batches instead of buffering the full result set. |
HATIDATA_RESULT_STREAMING_BATCH_SIZE | 1000 | Number of rows per batch when result streaming is enabled. |
Data Durability
Write-ahead log and periodic Parquet flush settings for crash-safe data persistence.
| Variable | Default | Description |
|---|---|---|
HATIDATA_AUTO_PERSIST_ENABLED | true | Enable periodic Parquet flush from in-memory DuckDB to object storage. |
HATIDATA_FLUSH_INTERVAL_SECS | 30 | Interval in seconds between periodic Parquet flush cycles. |
HATIDATA_FLUSH_SIZE_THRESHOLD_MB | 10 | Size threshold in megabytes for triggering an immediate flush outside the periodic interval. |
HATIDATA_WAL_ENABLED | true | Enable write-ahead log for crash recovery. All mutations are appended to the WAL before being applied. |
HATIDATA_WAL_DIR | /data/wal/ | Directory for WAL segment files. Must be on a durable volume in production. |
HATIDATA_WAL_FSYNC | true | Call fsync after each WAL append for durability. Disabling improves write throughput at the risk of losing the last few writes on crash. |
HATIDATA_DUCKDB_PATH | "" (in-memory) | Path to a file-backed DuckDB database. Empty uses an in-memory database (data is lost on restart unless WAL + Parquet flush are enabled). |
For production deployments, enable both HATIDATA_WAL_ENABLED and HATIDATA_AUTO_PERSIST_ENABLED. The WAL provides crash recovery for the most recent writes, while Parquet flush provides durable long-term storage in object storage.
Related
- Query Pipeline -- 15-step query execution flow from connection to response.
- Security Model -- Authentication, authorization, encryption, and audit architecture.
- Deployment Modes -- Docker Compose, Kubernetes, and multi-cloud deployment options.