Skip to main content

Security Model

HatiData is designed for agent workloads where a single identity can issue thousands of queries per minute against sensitive organizational data. This requires a security model that is both expressive enough to enforce fine-grained policies and fast enough to evaluate those policies on every query without adding perceptible latency.

The model is organized as overlapping layers. Authentication establishes who is asking. Authorization determines what they can do. Encryption protects data in transit and at rest. Audit creates a tamper-evident record of everything that happened.

Security Data Flow

Every query passes through a sequence of security-relevant stages in the proxy before reaching the execution engine. The following diagram shows the complete data flow with security checkpoints:

flowchart LR
A[Agent] -->|TLS 1.3| B[Proxy :5439]
B --> C[Auth]
C --> D[Table Extract]
D --> E[Policy Check]
E --> F[Row Filter]
F --> G[Transpile]
G --> H[Execute]
H --> I[Column Mask]
I --> J[Audit]
J --> K[Response]

style C fill:#7c5cfc,color:#fff
style E fill:#7c5cfc,color:#fff
style F fill:#7c5cfc,color:#fff
style I fill:#7c5cfc,color:#fff
style J fill:#7c5cfc,color:#fff

Stage-by-Stage Security Breakdown

StageSecurity FunctionWhat Happens on Failure
TLSEncrypts the connection between the agent and the proxy. TLS 1.3 is required in production; plaintext connections are rejected.Connection refused with a TLS handshake error.
AuthValidates the agent's identity. The proxy verifies the API key (password field in Postgres wire protocol) or JWT against the control plane's cached credentials. Extracts agent_id, org_id, roles, and ABAC attributes from the credential.ERROR: authentication failed — the query is never parsed.
Table ExtractParses the SQL AST to identify every table referenced in the query (including subqueries, CTEs, and JOINs). Cross-tenant schema references are detected and blocked here.ERROR: cross-tenant table reference detected — query rejected before any data access.
Policy CheckEvaluates RBAC roles and ABAC attribute conditions against the requested operation and tables. Checks whether the agent's role permits the query type (SELECT, INSERT, etc.) on the identified tables.ERROR: permission denied with a reason string from the matching deny rule.
Row FilterInjects RLS (row-level security) predicates into the query's WHERE clause. Predicates are resolved from the agent's JWT claims (e.g., :org_id, :allowed_regions). This stage modifies the SQL before transpilation so the filter cannot be bypassed.No failure mode — RLS predicates are always injected when configured. If no RLS rules match, the query passes through unmodified.
TranspileRewrites Snowflake-dialect SQL to DuckDB-compatible SQL. This stage is security-neutral but operates on the already-filtered query.ERROR: unsupported SQL syntax — transpilation errors are returned to the agent.
ExecuteRuns the transpiled query against DuckDB. The execution engine operates on the organization's isolated schema — it has no access to other tenants' data at the storage level.Standard SQL errors (syntax, missing table, type mismatch) are returned to the agent.
Column MaskApplies column masking rules (redact or hash) to the result set before serialization. Sensitive column values are replaced based on the agent's role and the policy bundle.No failure mode — masking is applied transparently. If no masking rules match, results pass through unmodified.
AuditWrites a hash-chained audit record capturing the query text, agent identity, tables accessed, row count, latency, credits consumed, and outcome (permitted/denied/healed). The record is appended to the session's hash chain.Audit write failures are logged but do not block the response. The proxy operates with an audit-best-effort guarantee to avoid audit failures causing query outages.

The key security property of this pipeline is that filtering happens before execution. Row-level security predicates and policy checks are applied to the SQL before it reaches DuckDB. This means an agent cannot construct SQL that bypasses security — the proxy rewrites the query itself, not just the results.


Authentication

JWT (JSON Web Tokens)

The primary authentication mechanism for agents. JWTs are issued by the control plane after a successful credential exchange and contain a standard set of claims:

ClaimDescription
subAgent identity (unique, stable identifier)
org_idOrganization the agent belongs to
agent_idAgent name or logical role (e.g., analytics-bot, reporting-service)
rolesArray of RBAC roles assigned to this agent
attrsKey-value map of ABAC attributes
expExpiry timestamp (default: 1 hour)
iatIssued-at timestamp

JWTs are signed with RS256. The data plane caches the control plane's public key and validates JWTs locally without a network round-trip on every query — this keeps authentication overhead below 1 millisecond even under high query rates.

Token refresh is handled automatically by the HatiData SDK clients. When a token is within 5 minutes of expiry, the SDK exchanges it for a fresh token in the background while continuing to use the existing token for in-flight queries.

API Keys

Long-lived API keys are available for service-to-service integrations where JWT refresh cycles are inconvenient. API keys are:

  • Stored in the control plane as hashed values (cryptographic hash + per-key salt) — the plaintext is shown only once at creation
  • Associated with a fixed set of RBAC roles and ABAC attributes at creation time, not dynamically resolvable
  • Revocable immediately — revocation propagates to the data plane within one policy bundle refresh cycle (default: 30 seconds)
  • Rate-limited independently from JWT-authenticated traffic

API keys are appropriate for CI pipelines, scheduled jobs, and infrastructure tooling. For interactive agent sessions, JWT is preferred because it supports dynamic attribute injection and shorter validity windows.

Federated Identity

Organizations that operate their own identity providers can authenticate agents using federated identity. HatiData supports:

ProtocolProvider examples
OIDC (OpenID Connect)Okta, Auth0, Google Workspace, Azure Entra
SAML 2.0Okta, ADFS, PingFederate
AWS IAM RolesEC2 instance profiles, ECS task roles, Lambda execution roles
GCP Workload IdentityGKE pod identity, Cloud Run service accounts
Azure Managed IdentityAKS pod identity, App Service managed identity

With federated identity, the control plane acts as a token exchange endpoint: the agent presents a token from the external IdP and receives a HatiData JWT with roles and attributes mapped according to organization-defined mapping rules.

For cloud-native deployments, workload identity integration means agents running in GKE pods or ECS tasks do not need to manage any credentials at all — they exchange their cloud-provider identity for a HatiData JWT automatically.

Authorization

RBAC — Role-Based Access Control

HatiData defines six built-in roles with escalating privileges:

RoleRead tablesWrite tablesDMLSchema changesManage policiesManage users
auditorAudit logs + policies onlyNoNoNoNoNo
analystAll non-restrictedNoNoNoNoNo
developerAll non-restrictedAllowed tablesINSERT, UPDATEYesNoNo
adminAllAllAllYesYesNo
ownerAllAllAllYesYesYes
service_accountPer API key scopesPer API key scopesPer API key scopesNoNoNo

The auditor role provides read-only access to audit logs, policies, and compliance data — designed for compliance officers who should not execute arbitrary queries. The analyst role provides broad read access for data exploration. The developer role adds write access and schema management for building and testing agent workflows. The service_account role is used by API keys with fine-grained scope control (see API Key Scopes below).

Roles are assigned at the organization level and can be scoped to specific schemas or tables. An agent can hold multiple roles simultaneously — permissions are the union of all assigned roles.

API Key Scopes

API keys support 22 granular scopes for fine-grained access control:

ScopeDescription
QueryReadExecute SELECT queries
QueryWriteExecute INSERT, UPDATE, DELETE, DDL
SchemaReadList tables, describe columns
SchemaWriteCreate/alter/drop tables
PolicyReadView policies
PolicyWriteCreate/update/delete policies
AuditReadView audit logs
AuditExportExport audit data
BillingReadView billing and quota data
BillingWriteUpdate billing settings
UserReadView org users
UserWriteInvite/remove users, change roles
KeyReadView API keys (metadata only)
KeyWriteCreate/rotate/revoke API keys
MemoryReadSearch and retrieve agent memories
MemoryWriteStore and delete agent memories
CotReadReplay and verify CoT sessions
CotWriteLog reasoning steps
TriggerReadList semantic triggers
TriggerManageCreate/delete/test triggers
BranchCreateCreate, query, discard branches
BranchMergeMerge branches to main

Pre-defined scope bundles: read_only (QueryRead, SchemaRead, MemoryRead, CotRead, TriggerRead), developer (read_only + QueryWrite, SchemaWrite, MemoryWrite, CotWrite, BranchCreate), admin (all scopes), agent_default (all query + agent feature scopes).

ABAC — Attribute-Based Access Control

ABAC evaluates fine-grained conditions against request-time attributes. Where RBAC answers "can this role access this table?", ABAC answers "can this specific request access this resource given all contextual factors?".

HatiData evaluates 10 standard ABAC attributes:

AttributeDescriptionExample values
agent_idThe agent's logical identityanalytics-bot, nightly-report
agent_purposeDeclared purpose taganalytics, compliance, operations
data_classificationSensitivity level of requested tablepublic, internal, confidential, restricted
request_time_hourHour of day (0–23, UTC)9, 22
request_day_of_weekDay of week (0=Mon)0, 5
source_ip_cidrClient IP network10.0.0.0/8, 192.168.1.0/24
query_typeType of SQL statementSELECT, INSERT, UPDATE, DELETE
row_count_limitMaximum allowed result rows1000, 50000
environmentDeployment environment tagproduction, staging, dev
org_tierOrganization's subscription tierfree, cloud, growth, enterprise

ABAC rules are expressed as conditions over these attributes. Rules are evaluated in order, and the first matching rule's action applies (permit or deny). Rules with deny actions are evaluated before permit rules regardless of order.

# Example: Restrict confidential table access to business hours, production agents only
- condition:
data_classification: confidential
request_time_hour:
range: [0, 8] # midnight to 8am UTC
action: deny
reason: "Confidential data not accessible outside business hours"

- condition:
data_classification: confidential
environment: [dev, staging]
action: deny
reason: "Confidential data not accessible from non-production agents"

Row-Level Security (RLS)

RLS predicates are injected into every query during the processing stage of the Query Pipeline. Each rule associates a table with a predicate that is appended to the query's WHERE clause.

RLS rules can reference the authenticated agent's claims as variables:

-- RLS rule for the "orders" table:
tenant_id = :org_id AND region IN (:allowed_regions)

At runtime, :org_id and :allowed_regions are resolved from the agent's JWT claims before injection. This means the same RLS rule enforces different filters for different organizations without requiring separate rule definitions per tenant.

RLS is enforced by the data plane — it cannot be bypassed by constructing clever SQL, because the injection happens before transpilation and execution.

Column Masking

Column masking is applied after query execution during the post-processing stage of the pipeline. Sensitive column values are replaced before results are serialized and returned to the agent.

Two masking modes are available:

Redact: The column value is replaced with the string ***REDACTED***. The agent can see that the column exists and that it had a value, but cannot see or infer the value.

Hash: The column value is replaced with a keyed cryptographic hash computed with a per-organization secret key. Hashing preserves equality semantics — agents can group by, count distinct, and join on hashed columns — but cannot reverse the hash to obtain the original value.

Column masking rules are defined in the policy bundle and are evaluated per-query based on the requesting agent's role and attributes. An agent with the admin role may see unmasked values; the same agent running with a scoped API key may see masked values for the same column.

JIT (Just-In-Time) Access

For highly sensitive operations — bulk exports, schema changes, cross-tenant queries — HatiData supports JIT access elevation. An agent requests elevated access through the control plane, specifying the operation, the tables involved, and the justification. A human approver (or an automated policy) grants or denies the request. Approved elevations are time-bounded (default: 30 minutes) and create an explicit audit record regardless of whether the elevated operation is ultimately performed.

JIT access is integrated with the hatiOS Pause & Pivot system for organizations that use both products.

Encryption

TLS 1.3 in Transit

All connections between agents and the data plane, and between the data plane and the control plane, use TLS 1.3. Older protocol versions are explicitly disabled. The Postgres wire protocol listener requires TLS — plaintext connections are rejected.

For PrivateLink deployments, TLS is maintained end-to-end even over the private backbone. The private network does not substitute for transport encryption.

Encryption at Rest

Data at rest is encrypted using AES-256-GCM. This applies to:

  • Query engine data files and WAL segments
  • Cached query results stored on local SSD (tier 2 cache)
  • Audit log files
  • Snapshot storage in object storage (tier 3 cache)

Encryption at rest is handled by the storage layer on cloud deployments (EBS encryption, Cloud Storage CMEK, Azure Disk Encryption). For local deployments, disk encryption is the operator's responsibility.

CMEK (Customer-Managed Encryption Keys)

Enterprise organizations can bring their own encryption keys managed in their own KMS:

CloudKey service
AWSAWS KMS (CMK)
GCPCloud KMS (CMEK)
AzureAzure Key Vault (customer-managed key)

With CMEK, HatiData's infrastructure never has access to the plaintext encryption key. The data plane holds only an encrypted data key; the master key remains in the customer's KMS and must be accessible for the data plane to start. Key rotation is handled by the KMS — HatiData re-encrypts data keys automatically when the master key is rotated.

See CMEK Configuration for setup instructions per cloud provider.

Audit

Hash-Chained Records

Every query — whether permitted, rejected, or healed — produces an audit record. Records are hash-chained: each record contains a cryptographic hash of the previous record in the same session. The chain is seeded with a cryptographically random session seed stored in the control plane at session start.

session_seed --> record_1 --> record_2 --> ... --> record_n
| | | |
+-- hash_0 +-- hash_1 +-- hash_2 +-- hash_n

To verify that a record has not been tampered with, you re-compute the chain from the session seed and compare each computed hash to the stored hash. Any modification to any field of any record — including the timestamp, the SQL, or the outcome — produces a hash mismatch that is detectable at the point of modification and for all subsequent records.

Tamper Detection

The control plane exposes a chain verification API (GET /v1/audit/sessions/{id}/verify) that replays and re-hashes the entire chain for a session. The response reports the chain status (valid, tampered, incomplete) and the index of the first invalid record if tampering is detected.

Organizations can export audit chains and verify them independently using the open verification specification published in the HatiData audit documentation.

Compliance Exports

Audit records can be exported in formats suitable for common compliance frameworks:

FormatUse case
NDJSONGeneral-purpose, machine-readable
CSVSpreadsheet analysis, manual review
ParquetLarge-scale analytics over audit history
SIEM (CEF / JSON)Integration with Splunk, Datadog, Elastic

Exports include the full hash chain so that the exported records can be independently verified.

Multi-Tenancy

Tenant Isolation

Each organization's data in HatiData is isolated at the schema level. Tables belonging to organization acme live in a schema namespace that is inaccessible to queries from organization globex. Schema separation is enforced at the proxy layer — agents cannot submit SQL that references schemas outside their organization's namespace, even if they somehow knew the schema names.

Cross-Tenant JOIN Prevention

Queries that attempt to JOIN across organization schemas are detected during table extraction (in the security stage of the pipeline) and rejected before any execution occurs. The error message identifies which table references caused the cross-tenant violation without revealing the schema names of other organizations.

This protection exists at the SQL level as well as the filesystem level — even if an operator misconfigured storage permissions, the proxy would still prevent cross-tenant data access.

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.