Configuration Reference

All configuration is via environment variables prefixed with HATIDATA_. For Docker deployments, set these in your docker-compose.yml or .env file. For Kubernetes, use ConfigMaps or Secrets.

Server & Network

Core listener and transport settings for the proxy.

Variable	Default	Description
`HATIDATA_LISTEN_ADDR`	`0.0.0.0:5439`	Postgres wire protocol listener address and port.
`HATIDATA_TLS_ENABLED`	`false`	Enable TLS termination on the proxy listener.
`HATIDATA_TLS_CERT_PATH`	`/etc/hatidata/tls/cert.pem`	Path to the TLS certificate file (PEM format).
`HATIDATA_TLS_KEY_PATH`	`/etc/hatidata/tls/key.pem`	Path to the TLS private key file (PEM format).
`HATIDATA_MAX_CONNECTIONS`	`500`	Maximum number of concurrent TCP connections the proxy will accept.

Authentication

Identity, JWT verification, and control plane connectivity.

Variable	Default	Description
`HATIDATA_JWT_SECRET`	`hatidata-dev-secret-change-me`	HMAC secret used to sign and verify JWT tokens. Must be changed in production.
`HATIDATA_CONTROL_PLANE_URL`	`http://localhost:8080`	Base URL of the HatiData control plane API.
`HATIDATA_CONTROL_PLANE_API_KEY`	`hd_internal_dev`	API key used to authenticate requests to the control plane.
`HATIDATA_ORG_ID`	`""`	Organization ID this proxy instance belongs to.
`HATIDATA_ENV_ID`	`""`	Environment ID for multi-environment deployments.

Query Engine

Tuning parameters for the embedded columnar query engine.

Variable	Default	Description
`HATIDATA_DUCKDB_MEMORY_LIMIT_MB`	`24000`	Engine heap memory cap in megabytes (24 GB default).
`HATIDATA_DUCKDB_THREADS`	`0` (auto-detect)	Number of engine worker threads. `0` uses all available CPU cores.
`HATIDATA_TEMP_DIRECTORY`	`/tmp/hatidata`	Scratch directory for spill-to-disk during large sorts and joins.
`HATIDATA_DUCKDB_PATH`	`""` (in-memory)	Path to a persistent database file. Empty uses an in-memory database.
`HATIDATA_QUERY_TIMEOUT_SECS`	`300`	Per-query timeout in seconds. Queries exceeding this are cancelled.
`HATIDATA_MAX_CONCURRENT_QUERIES`	`100`	Maximum number of queries executing concurrently across all connections.

Cloud Storage

Object storage, encryption, and audit log configuration for multi-cloud deployments.

Variable	Default	Description
`HATIDATA_CLOUD_PROVIDER`	`local`	Cloud provider backend: `local`, `aws`, `gcp`, or `azure`.
`HATIDATA_STORAGE_BUCKET`	`customer-data-lake`	Object storage bucket name for customer data.
`HATIDATA_STORAGE_PREFIX`	`data/`	Key prefix for objects within the storage bucket.
`HATIDATA_CLOUD_REGION`	`ap-southeast-1`	Cloud region for storage and compute resources.
`HATIDATA_STORAGE_ENDPOINT`	`""`	Custom storage endpoint URL (e.g., for S3-compatible storage or localstack).
`HATIDATA_STORAGE_FORCE_PATH_STYLE`	`false`	Use S3 path-style access instead of virtual-hosted-style. Required for S3-compatible storage.
`HATIDATA_KMS_PROVIDER`	`local`	Key management provider: `local`, `aws`, `gcp`, or `azure`.
`HATIDATA_LOCAL_ENCRYPTION_KEY`	`""`	Encryption key for the local KMS provider.
`HATIDATA_AUDIT_BUCKET`	`hatidata-audit`	Object storage bucket for audit log archives.

Caching

Multi-tier caching configuration: in-memory, Redis, local disk, and transpilation cache.

Variable	Default	Description
`HATIDATA_CACHE_RAM_MAX_ENTRIES`	`10000`	Maximum entries in the in-memory cache.
`HATIDATA_CACHE_RAM_TTL_SECS`	`300`	Time-to-live in seconds for in-memory cache entries.
`HATIDATA_NVME_CACHE_PATH`	`/var/cache/hatidata`	Directory path for the local disk cache.
`HATIDATA_NVME_CACHE_MAX_GB`	`700`	Maximum size of the local disk cache in gigabytes.
`HATIDATA_REDIS_ENABLED`	`false`	Enable the Redis-backed cache layer.
`HATIDATA_REDIS_URL`	`redis://localhost:6379`	Redis connection URL.
`HATIDATA_REDIS_TTL_SECS`	`300`	Time-to-live in seconds for Redis cache entries.
`HATIDATA_REDIS_PREFIX`	`hatidata`	Key prefix for all Redis entries (supports multi-tenant isolation).
`HATIDATA_TRANSPILE_CACHE_MAX_ENTRIES`	`10000`	Maximum entries in the SQL transpilation result cache.
`HATIDATA_TRANSPILE_CACHE_TTL_SECS`	`3600`	Time-to-live in seconds for cached transpilation results.
`HATIDATA_CACHE_WARMUP_ENABLED`	`false`	Pre-load popular queries into the cache on proxy startup.

MCP Server

Model Context Protocol server settings for AI agent connectivity.

Variable	Default	Description
`HATIDATA_MCP_ENABLED`	`true`	Enable the MCP server alongside the Postgres wire protocol listener.
`HATIDATA_MCP_LISTEN_ADDR`	`0.0.0.0:5440`	MCP HTTP listener address and port.
`HATIDATA_MCP_TLS_ENABLED`	`false`	Enable TLS on the MCP listener.
`HATIDATA_MCP_MAX_SESSIONS`	`1000`	Maximum number of concurrent MCP sessions.
`HATIDATA_MCP_RATE_LIMIT_PER_SECOND`	`50.0`	Sustained request rate limit per MCP session.
`HATIDATA_MCP_RATE_BURST_SIZE`	`100`	Maximum burst size for MCP rate limiting.

Policy Sync

Controls how the proxy fetches and applies governance policies from the control plane.

Variable	Default	Description
`HATIDATA_POLICY_SYNC_ENABLED`	`true`	Enable periodic policy synchronization from the control plane.
`HATIDATA_POLICY_SYNC_INTERVAL_SECS`	`30`	Interval in seconds between policy refresh cycles.
`HATIDATA_POLICY_SYNC_STARTUP_TIMEOUT_SECS`	`30`	Maximum seconds to wait for the initial policy sync on startup.
`HATIDATA_POLICY_SYNC_ALLOW_DEGRADED`	`false`	INSECURE: If `true`, the proxy will serve queries with empty policies when the initial sync times out. Not recommended for production.

Agent Engine

Settings for agent-native features including long-term memory, chain-of-thought, and hot context.

Variable	Default	Description
`HATIDATA_AGENT_ENGINE_ENABLED`	`true`	Enable the agent engine (MCP tools, hot context, RAG search).
`HATIDATA_AGENT_MAX_CONCURRENT_PER_AGENT`	`10`	Maximum concurrent operations per individual agent.
`HATIDATA_AGENT_MAX_CONCURRENT_GLOBAL`	`100`	Maximum concurrent agent operations across all agents.
`HATIDATA_AGENT_RATE_LIMIT_PER_SECOND`	`50.0`	Sustained request rate limit for agent operations.
`HATIDATA_AGENT_RATE_BURST_SIZE`	`100`	Maximum burst size for agent rate limiting.
`HATIDATA_AGENT_HOT_TABLES`	`""`	Comma-separated list of table names to pin in the agent hot context cache.

Auto-Suspend

Query engine lifecycle management for cost optimization.

Variable	Default	Description
`HATIDATA_AUTO_SUSPEND_ENABLED`	`true`	Enable automatic suspension of the query engine after idle periods.
`HATIDATA_AUTO_SUSPEND_IDLE_SECS`	`5`	Seconds of inactivity before the query engine is suspended.
`HATIDATA_AUTO_SUSPEND_RESUME_TIMEOUT_SECS`	`30`	Maximum seconds allowed for the query engine to resume from a suspended state.

Embedding & Vector Search

Configuration for embedding providers and the vector database used by agent memory and semantic triggers.

Variable	Default	Description
`HATIDATA_EMBEDDING_PROVIDER`	`mock`	Embedding provider: `mock`, `sidecar`, `openai`, `voyage`, `together-embed`, or `cohere`.
`HATIDATA_EMBEDDING_SIDECAR_URL`	`http://localhost:8090`	URL of the local embedding service.
`HATIDATA_OPENAI_API_KEY`	`""`	OpenAI API key for the `openai` embedding provider.
`HATIDATA_OPENAI_EMBEDDING_MODEL`	`text-embedding-3-small`	OpenAI embedding model name.
`HATIDATA_EMBEDDING_BASE_URL`	`""`	Custom base URL for OpenAI-compatible embedding APIs.
`HATIDATA_COHERE_API_KEY`	`""`	Cohere API key for the `cohere` embedding provider.
`HATIDATA_COHERE_EMBEDDING_MODEL`	`embed-v4.0`	Cohere embedding model name.
`HATIDATA_QDRANT_URL`	`http://localhost:6334`	Vector database gRPC endpoint.
`HATIDATA_QDRANT_API_KEY`	`""`	API key for vector database authentication.
`HATIDATA_QDRANT_COLLECTION_PREFIX`	`hatidata`	Prefix for vector collection names (supports multi-tenant isolation).

AI Healing

Automatic query repair when transpilation or execution fails.

Variable	Default	Description
`HATIDATA_TRANSPILATION_ENABLED`	`true`	Enable Snowflake SQL auto-transpilation.
`HATIDATA_HEALER_API_ENDPOINT`	`""`	External AI healer API endpoint. Empty disables AI healing.
`HATIDATA_HEALER_MAX_RETRIES`	`3`	Maximum retry attempts when the healer API returns an error.
`HATIDATA_HEALER_PII_REDACTION`	`true`	Redact literal values from SQL before sending to the external healer API.

Cost & Billing

Credit metering, cost estimation, and budget enforcement.

Variable	Default	Description
`HATIDATA_COST_ESTIMATION_ENABLED`	`false`	Enable enhanced cost estimation for query planning.
`HATIDATA_COST_PER_CREDIT`	`0.40`	Dollar cost per compute credit for billing calculations.
`HATIDATA_HARD_CAP_ENABLED`	`true`	Hard-block queries that would exceed the organization's credit budget.
`HATIDATA_METERING_FLUSH_INTERVAL_SECS`	`60`	Interval in seconds for flushing accumulated metering data to the control plane.
`HATIDATA_AGENT_ACTIVITY_FLUSH_INTERVAL_SECS`	`10`	Interval in seconds for flushing agent activity metrics to the control plane.

Observability

Logging, metrics, tracing, and SIEM integration.

Variable	Default	Description
`HATIDATA_LOG_LEVEL`	`info`	Log verbosity level: `trace`, `debug`, `info`, `warn`, or `error`.
`HATIDATA_METRICS_PORT`	`9090`	Port for the Prometheus-compatible metrics endpoint.
`HATIDATA_OTEL_ENABLED`	`false`	Enable OpenTelemetry distributed tracing.
`HATIDATA_OTEL_ENDPOINT`	`http://localhost:4317`	OpenTelemetry Collector gRPC endpoint.
`HATIDATA_OTEL_SERVICE_NAME`	`hatidata-proxy`	Service name reported in OpenTelemetry spans.
`HATIDATA_SIEM_ENABLED`	`false`	Enable forwarding of security events to a SIEM collector.
`HATIDATA_SIEM_TARGET_TYPE`	`webhook`	SIEM target type: `webhook`, `splunk_hec`, or `datadog`.
`HATIDATA_SIEM_ENDPOINT_URL`	`""`	SIEM collector endpoint URL.

Security & Governance

Runtime security controls, tenant isolation, and data governance features.

Variable	Default	Description
`HATIDATA_RESOURCE_GOVERNOR_ENABLED`	`false`	Enable per-organization compute resource isolation.
`HATIDATA_TIER_GATE_ENABLED`	`false`	Enable per-organization feature enforcement based on subscription tier.
`HATIDATA_TENANT_ISOLATION_ENABLED`	`false`	Prevent cross-tenant JOINs at the query level.
`HATIDATA_SNAPSHOT_PINNING_ENABLED`	`true`	Enable snapshot pinning for consistent point-in-time reads.
`HATIDATA_RESULT_STREAMING_ENABLED`	`false`	Stream query results in batches instead of buffering the full result set.
`HATIDATA_RESULT_STREAMING_BATCH_SIZE`	`1000`	Number of rows per batch when result streaming is enabled.

Data Durability

Write-ahead log and periodic Parquet flush settings for crash-safe data persistence.

Variable	Default	Description
`HATIDATA_AUTO_PERSIST_ENABLED`	`true`	Enable periodic Parquet flush from in-memory DuckDB to object storage.
`HATIDATA_FLUSH_INTERVAL_SECS`	`30`	Interval in seconds between periodic Parquet flush cycles.
`HATIDATA_FLUSH_SIZE_THRESHOLD_MB`	`10`	Size threshold in megabytes for triggering an immediate flush outside the periodic interval.
`HATIDATA_WAL_ENABLED`	`true`	Enable write-ahead log for crash recovery. All mutations are appended to the WAL before being applied.
`HATIDATA_WAL_DIR`	`/data/wal/`	Directory for WAL segment files. Must be on a durable volume in production.
`HATIDATA_WAL_FSYNC`	`true`	Call fsync after each WAL append for durability. Disabling improves write throughput at the risk of losing the last few writes on crash.
`HATIDATA_DUCKDB_PATH`	`""` (in-memory)	Path to a file-backed DuckDB database. Empty uses an in-memory database (data is lost on restart unless WAL + Parquet flush are enabled).

tip

For production deployments, enable both HATIDATA_WAL_ENABLED and HATIDATA_AUTO_PERSIST_ENABLED. The WAL provides crash recovery for the most recent writes, while Parquet flush provides durable long-term storage in object storage.

Query Pipeline -- 15-step query execution flow from connection to response.
Security Model -- Authentication, authorization, encryption, and audit architecture.
Deployment Modes -- Docker Compose, Kubernetes, and multi-cloud deployment options.

Server & Network​

Authentication​

Query Engine​

Cloud Storage​

Caching​

MCP Server​

Policy Sync​

Agent Engine​

Auto-Suspend​

Embedding & Vector Search​

AI Healing​

Cost & Billing​

Observability​

Security & Governance​

Data Durability​

Related​

Stay in the loop