Skip to main content

Configuration Reference

All configuration is via environment variables prefixed with HATIDATA_. For Docker deployments, set these in your docker-compose.yml or .env file. For Kubernetes, use ConfigMaps or Secrets.


Server & Network

Core listener and transport settings for the proxy.

VariableDefaultDescription
HATIDATA_LISTEN_ADDR0.0.0.0:5439Postgres wire protocol listener address and port.
HATIDATA_TLS_ENABLEDfalseEnable TLS termination on the proxy listener.
HATIDATA_TLS_CERT_PATH/etc/hatidata/tls/cert.pemPath to the TLS certificate file (PEM format).
HATIDATA_TLS_KEY_PATH/etc/hatidata/tls/key.pemPath to the TLS private key file (PEM format).
HATIDATA_MAX_CONNECTIONS500Maximum number of concurrent TCP connections the proxy will accept.

Authentication

Identity, JWT verification, and control plane connectivity.

VariableDefaultDescription
HATIDATA_JWT_SECREThatidata-dev-secret-change-meHMAC secret used to sign and verify JWT tokens. Must be changed in production.
HATIDATA_CONTROL_PLANE_URLhttp://localhost:8080Base URL of the HatiData control plane API.
HATIDATA_CONTROL_PLANE_API_KEYhd_internal_devAPI key used to authenticate requests to the control plane.
HATIDATA_ORG_ID""Organization ID this proxy instance belongs to.
HATIDATA_ENV_ID""Environment ID for multi-environment deployments.

Query Engine

Tuning parameters for the embedded columnar query engine.

VariableDefaultDescription
HATIDATA_DUCKDB_MEMORY_LIMIT_MB24000Engine heap memory cap in megabytes (24 GB default).
HATIDATA_DUCKDB_THREADS0 (auto-detect)Number of engine worker threads. 0 uses all available CPU cores.
HATIDATA_TEMP_DIRECTORY/tmp/hatidataScratch directory for spill-to-disk during large sorts and joins.
HATIDATA_DUCKDB_PATH"" (in-memory)Path to a persistent database file. Empty uses an in-memory database.
HATIDATA_QUERY_TIMEOUT_SECS300Per-query timeout in seconds. Queries exceeding this are cancelled.
HATIDATA_MAX_CONCURRENT_QUERIES100Maximum number of queries executing concurrently across all connections.

Cloud Storage

Object storage, encryption, and audit log configuration for multi-cloud deployments.

VariableDefaultDescription
HATIDATA_CLOUD_PROVIDERlocalCloud provider backend: local, aws, gcp, or azure.
HATIDATA_STORAGE_BUCKETcustomer-data-lakeObject storage bucket name for customer data.
HATIDATA_STORAGE_PREFIXdata/Key prefix for objects within the storage bucket.
HATIDATA_CLOUD_REGIONap-southeast-1Cloud region for storage and compute resources.
HATIDATA_STORAGE_ENDPOINT""Custom storage endpoint URL (e.g., for S3-compatible storage or localstack).
HATIDATA_STORAGE_FORCE_PATH_STYLEfalseUse S3 path-style access instead of virtual-hosted-style. Required for S3-compatible storage.
HATIDATA_KMS_PROVIDERlocalKey management provider: local, aws, gcp, or azure.
HATIDATA_LOCAL_ENCRYPTION_KEY""Encryption key for the local KMS provider.
HATIDATA_AUDIT_BUCKEThatidata-auditObject storage bucket for audit log archives.

Caching

Multi-tier caching configuration: in-memory, Redis, local disk, and transpilation cache.

VariableDefaultDescription
HATIDATA_CACHE_RAM_MAX_ENTRIES10000Maximum entries in the in-memory cache.
HATIDATA_CACHE_RAM_TTL_SECS300Time-to-live in seconds for in-memory cache entries.
HATIDATA_NVME_CACHE_PATH/var/cache/hatidataDirectory path for the local disk cache.
HATIDATA_NVME_CACHE_MAX_GB700Maximum size of the local disk cache in gigabytes.
HATIDATA_REDIS_ENABLEDfalseEnable the Redis-backed cache layer.
HATIDATA_REDIS_URLredis://localhost:6379Redis connection URL.
HATIDATA_REDIS_TTL_SECS300Time-to-live in seconds for Redis cache entries.
HATIDATA_REDIS_PREFIXhatidataKey prefix for all Redis entries (supports multi-tenant isolation).
HATIDATA_TRANSPILE_CACHE_MAX_ENTRIES10000Maximum entries in the SQL transpilation result cache.
HATIDATA_TRANSPILE_CACHE_TTL_SECS3600Time-to-live in seconds for cached transpilation results.
HATIDATA_CACHE_WARMUP_ENABLEDfalsePre-load popular queries into the cache on proxy startup.

MCP Server

Model Context Protocol server settings for AI agent connectivity.

VariableDefaultDescription
HATIDATA_MCP_ENABLEDtrueEnable the MCP server alongside the Postgres wire protocol listener.
HATIDATA_MCP_LISTEN_ADDR0.0.0.0:5440MCP HTTP listener address and port.
HATIDATA_MCP_TLS_ENABLEDfalseEnable TLS on the MCP listener.
HATIDATA_MCP_MAX_SESSIONS1000Maximum number of concurrent MCP sessions.
HATIDATA_MCP_RATE_LIMIT_PER_SECOND50.0Sustained request rate limit per MCP session.
HATIDATA_MCP_RATE_BURST_SIZE100Maximum burst size for MCP rate limiting.

Policy Sync

Controls how the proxy fetches and applies governance policies from the control plane.

VariableDefaultDescription
HATIDATA_POLICY_SYNC_ENABLEDtrueEnable periodic policy synchronization from the control plane.
HATIDATA_POLICY_SYNC_INTERVAL_SECS30Interval in seconds between policy refresh cycles.
HATIDATA_POLICY_SYNC_STARTUP_TIMEOUT_SECS30Maximum seconds to wait for the initial policy sync on startup.
HATIDATA_POLICY_SYNC_ALLOW_DEGRADEDfalseINSECURE: If true, the proxy will serve queries with empty policies when the initial sync times out. Not recommended for production.

Agent Engine

Settings for agent-native features including long-term memory, chain-of-thought, and hot context.

VariableDefaultDescription
HATIDATA_AGENT_ENGINE_ENABLEDtrueEnable the agent engine (MCP tools, hot context, RAG search).
HATIDATA_AGENT_MAX_CONCURRENT_PER_AGENT10Maximum concurrent operations per individual agent.
HATIDATA_AGENT_MAX_CONCURRENT_GLOBAL100Maximum concurrent agent operations across all agents.
HATIDATA_AGENT_RATE_LIMIT_PER_SECOND50.0Sustained request rate limit for agent operations.
HATIDATA_AGENT_RATE_BURST_SIZE100Maximum burst size for agent rate limiting.
HATIDATA_AGENT_HOT_TABLES""Comma-separated list of table names to pin in the agent hot context cache.

Auto-Suspend

Query engine lifecycle management for cost optimization.

VariableDefaultDescription
HATIDATA_AUTO_SUSPEND_ENABLEDtrueEnable automatic suspension of the query engine after idle periods.
HATIDATA_AUTO_SUSPEND_IDLE_SECS5Seconds of inactivity before the query engine is suspended.
HATIDATA_AUTO_SUSPEND_RESUME_TIMEOUT_SECS30Maximum seconds allowed for the query engine to resume from a suspended state.

Configuration for embedding providers and the vector database used by agent memory and semantic triggers.

VariableDefaultDescription
HATIDATA_EMBEDDING_PROVIDERmockEmbedding provider: mock, sidecar, openai, voyage, together-embed, or cohere.
HATIDATA_EMBEDDING_SIDECAR_URLhttp://localhost:8090URL of the local embedding service.
HATIDATA_OPENAI_API_KEY""OpenAI API key for the openai embedding provider.
HATIDATA_OPENAI_EMBEDDING_MODELtext-embedding-3-smallOpenAI embedding model name.
HATIDATA_EMBEDDING_BASE_URL""Custom base URL for OpenAI-compatible embedding APIs.
HATIDATA_COHERE_API_KEY""Cohere API key for the cohere embedding provider.
HATIDATA_COHERE_EMBEDDING_MODELembed-v4.0Cohere embedding model name.
HATIDATA_QDRANT_URLhttp://localhost:6334Vector database gRPC endpoint.
HATIDATA_QDRANT_API_KEY""API key for vector database authentication.
HATIDATA_QDRANT_COLLECTION_PREFIXhatidataPrefix for vector collection names (supports multi-tenant isolation).

AI Healing

Automatic query repair when transpilation or execution fails.

VariableDefaultDescription
HATIDATA_TRANSPILATION_ENABLEDtrueEnable Snowflake SQL auto-transpilation.
HATIDATA_HEALER_API_ENDPOINT""External AI healer API endpoint. Empty disables AI healing.
HATIDATA_HEALER_MAX_RETRIES3Maximum retry attempts when the healer API returns an error.
HATIDATA_HEALER_PII_REDACTIONtrueRedact literal values from SQL before sending to the external healer API.

Cost & Billing

Credit metering, cost estimation, and budget enforcement.

VariableDefaultDescription
HATIDATA_COST_ESTIMATION_ENABLEDfalseEnable enhanced cost estimation for query planning.
HATIDATA_COST_PER_CREDIT0.40Dollar cost per compute credit for billing calculations.
HATIDATA_HARD_CAP_ENABLEDtrueHard-block queries that would exceed the organization's credit budget.
HATIDATA_METERING_FLUSH_INTERVAL_SECS60Interval in seconds for flushing accumulated metering data to the control plane.
HATIDATA_AGENT_ACTIVITY_FLUSH_INTERVAL_SECS10Interval in seconds for flushing agent activity metrics to the control plane.

Observability

Logging, metrics, tracing, and SIEM integration.

VariableDefaultDescription
HATIDATA_LOG_LEVELinfoLog verbosity level: trace, debug, info, warn, or error.
HATIDATA_METRICS_PORT9090Port for the Prometheus-compatible metrics endpoint.
HATIDATA_OTEL_ENABLEDfalseEnable OpenTelemetry distributed tracing.
HATIDATA_OTEL_ENDPOINThttp://localhost:4317OpenTelemetry Collector gRPC endpoint.
HATIDATA_OTEL_SERVICE_NAMEhatidata-proxyService name reported in OpenTelemetry spans.
HATIDATA_SIEM_ENABLEDfalseEnable forwarding of security events to a SIEM collector.
HATIDATA_SIEM_TARGET_TYPEwebhookSIEM target type: webhook, splunk_hec, or datadog.
HATIDATA_SIEM_ENDPOINT_URL""SIEM collector endpoint URL.

Security & Governance

Runtime security controls, tenant isolation, and data governance features.

VariableDefaultDescription
HATIDATA_RESOURCE_GOVERNOR_ENABLEDfalseEnable per-organization compute resource isolation.
HATIDATA_TIER_GATE_ENABLEDfalseEnable per-organization feature enforcement based on subscription tier.
HATIDATA_TENANT_ISOLATION_ENABLEDfalsePrevent cross-tenant JOINs at the query level.
HATIDATA_SNAPSHOT_PINNING_ENABLEDtrueEnable snapshot pinning for consistent point-in-time reads.
HATIDATA_RESULT_STREAMING_ENABLEDfalseStream query results in batches instead of buffering the full result set.
HATIDATA_RESULT_STREAMING_BATCH_SIZE1000Number of rows per batch when result streaming is enabled.

Data Durability

Write-ahead log and periodic Parquet flush settings for crash-safe data persistence.

VariableDefaultDescription
HATIDATA_AUTO_PERSIST_ENABLEDtrueEnable periodic Parquet flush from in-memory DuckDB to object storage.
HATIDATA_FLUSH_INTERVAL_SECS30Interval in seconds between periodic Parquet flush cycles.
HATIDATA_FLUSH_SIZE_THRESHOLD_MB10Size threshold in megabytes for triggering an immediate flush outside the periodic interval.
HATIDATA_WAL_ENABLEDtrueEnable write-ahead log for crash recovery. All mutations are appended to the WAL before being applied.
HATIDATA_WAL_DIR/data/wal/Directory for WAL segment files. Must be on a durable volume in production.
HATIDATA_WAL_FSYNCtrueCall fsync after each WAL append for durability. Disabling improves write throughput at the risk of losing the last few writes on crash.
HATIDATA_DUCKDB_PATH"" (in-memory)Path to a file-backed DuckDB database. Empty uses an in-memory database (data is lost on restart unless WAL + Parquet flush are enabled).
tip

For production deployments, enable both HATIDATA_WAL_ENABLED and HATIDATA_AUTO_PERSIST_ENABLED. The WAL provides crash recovery for the most recent writes, while Parquet flush provides durable long-term storage in object storage.


  • Query Pipeline -- 15-step query execution flow from connection to response.
  • Security Model -- Authentication, authorization, encryption, and audit architecture.
  • Deployment Modes -- Docker Compose, Kubernetes, and multi-cloud deployment options.

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.