Security Model

HatiData is designed for agent workloads where a single identity can issue thousands of queries per minute against sensitive organizational data. This requires a security model that is both expressive enough to enforce fine-grained policies and fast enough to evaluate those policies on every query without adding perceptible latency.

The model is organized as overlapping layers. Authentication establishes who is asking. Authorization determines what they can do. Encryption protects data in transit and at rest. Audit creates a tamper-evident record of everything that happened.

Security Data Flow

Every query passes through a sequence of security-relevant stages in the proxy before reaching the execution engine. The following diagram shows the complete data flow with security checkpoints:

flowchart LR
    A[Agent] -->|TLS 1.3| B[Proxy :5439]
    B --> C[Auth]
    C --> D[Table Extract]
    D --> E[Policy Check]
    E --> F[Row Filter]
    F --> G[Transpile]
    G --> H[Execute]
    H --> I[Column Mask]
    I --> J[Audit]
    J --> K[Response]

    style C fill:#7c5cfc,color:#fff
    style E fill:#7c5cfc,color:#fff
    style F fill:#7c5cfc,color:#fff
    style I fill:#7c5cfc,color:#fff
    style J fill:#7c5cfc,color:#fff

Stage-by-Stage Security Breakdown

Stage	Security Function	What Happens on Failure
TLS	Encrypts the connection between the agent and the proxy. TLS 1.3 is required in production; plaintext connections are rejected.	Connection refused with a TLS handshake error.
Auth	Validates the agent's identity. The proxy verifies the API key (password field in Postgres wire protocol) or JWT against the control plane's cached credentials. Extracts `agent_id`, `org_id`, roles, and ABAC attributes from the credential.	`ERROR: authentication failed` — the query is never parsed.
Table Extract	Parses the SQL AST to identify every table referenced in the query (including subqueries, CTEs, and JOINs). Cross-tenant schema references are detected and blocked here.	`ERROR: cross-tenant table reference detected` — query rejected before any data access.
Policy Check	Evaluates RBAC roles and ABAC attribute conditions against the requested operation and tables. Checks whether the agent's role permits the query type (SELECT, INSERT, etc.) on the identified tables.	`ERROR: permission denied` with a reason string from the matching deny rule.
Row Filter	Injects RLS (row-level security) predicates into the query's WHERE clause. Predicates are resolved from the agent's JWT claims (e.g., `:org_id`, `:allowed_regions`). This stage modifies the SQL before transpilation so the filter cannot be bypassed.	No failure mode — RLS predicates are always injected when configured. If no RLS rules match, the query passes through unmodified.
Transpile	Rewrites Snowflake-dialect SQL to DuckDB-compatible SQL. This stage is security-neutral but operates on the already-filtered query.	`ERROR: unsupported SQL syntax` — transpilation errors are returned to the agent.
Execute	Runs the transpiled query against DuckDB. The execution engine operates on the organization's isolated schema — it has no access to other tenants' data at the storage level.	Standard SQL errors (syntax, missing table, type mismatch) are returned to the agent.
Column Mask	Applies column masking rules (redact or hash) to the result set before serialization. Sensitive column values are replaced based on the agent's role and the policy bundle.	No failure mode — masking is applied transparently. If no masking rules match, results pass through unmodified.
Audit	Writes a hash-chained audit record capturing the query text, agent identity, tables accessed, row count, latency, credits consumed, and outcome (permitted/denied/healed). The record is appended to the session's hash chain.	Audit write failures are logged but do not block the response. The proxy operates with an audit-best-effort guarantee to avoid audit failures causing query outages.

The key security property of this pipeline is that filtering happens before execution. Row-level security predicates and policy checks are applied to the SQL before it reaches DuckDB. This means an agent cannot construct SQL that bypasses security — the proxy rewrites the query itself, not just the results.

Authentication

JWT (JSON Web Tokens)

The primary authentication mechanism for agents. JWTs are issued by the control plane after a successful credential exchange and contain a standard set of claims:

Claim	Description
`sub`	Agent identity (unique, stable identifier)
`org_id`	Organization the agent belongs to
`agent_id`	Agent name or logical role (e.g., `analytics-bot`, `reporting-service`)
`roles`	Array of RBAC roles assigned to this agent
`attrs`	Key-value map of ABAC attributes
`exp`	Expiry timestamp (default: 1 hour)
`iat`	Issued-at timestamp

JWTs are signed with RS256. The data plane caches the control plane's public key and validates JWTs locally without a network round-trip on every query — this keeps authentication overhead below 1 millisecond even under high query rates.

Token refresh is handled automatically by the HatiData SDK clients. When a token is within 5 minutes of expiry, the SDK exchanges it for a fresh token in the background while continuing to use the existing token for in-flight queries.

API Keys

Long-lived API keys are available for service-to-service integrations where JWT refresh cycles are inconvenient. API keys are:

Stored in the control plane as hashed values (cryptographic hash + per-key salt) — the plaintext is shown only once at creation
Associated with a fixed set of RBAC roles and ABAC attributes at creation time, not dynamically resolvable
Revocable immediately — revocation propagates to the data plane within one policy bundle refresh cycle (default: 30 seconds)
Rate-limited independently from JWT-authenticated traffic

API keys are appropriate for CI pipelines, scheduled jobs, and infrastructure tooling. For interactive agent sessions, JWT is preferred because it supports dynamic attribute injection and shorter validity windows.

Federated Identity

Organizations that operate their own identity providers can authenticate agents using federated identity. HatiData supports:

Protocol	Provider examples
OIDC (OpenID Connect)	Okta, Auth0, Google Workspace, Azure Entra
SAML 2.0	Okta, ADFS, PingFederate
AWS IAM Roles	EC2 instance profiles, ECS task roles, Lambda execution roles
GCP Workload Identity	GKE pod identity, Cloud Run service accounts
Azure Managed Identity	AKS pod identity, App Service managed identity

With federated identity, the control plane acts as a token exchange endpoint: the agent presents a token from the external IdP and receives a HatiData JWT with roles and attributes mapped according to organization-defined mapping rules.

For cloud-native deployments, workload identity integration means agents running in GKE pods or ECS tasks do not need to manage any credentials at all — they exchange their cloud-provider identity for a HatiData JWT automatically.

Authorization

RBAC — Role-Based Access Control

HatiData defines six built-in roles with escalating privileges:

Role	Read tables	Write tables	DML	Schema changes	Manage policies	Manage users
`auditor`	Audit logs + policies only	No	No	No	No	No
`analyst`	All non-restricted	No	No	No	No	No
`developer`	All non-restricted	Allowed tables	INSERT, UPDATE	Yes	No	No
`admin`	All	All	All	Yes	Yes	No
`owner`	All	All	All	Yes	Yes	Yes
`service_account`	Per API key scopes	Per API key scopes	Per API key scopes	No	No	No

The auditor role provides read-only access to audit logs, policies, and compliance data — designed for compliance officers who should not execute arbitrary queries. The analyst role provides broad read access for data exploration. The developer role adds write access and schema management for building and testing agent workflows. The service_account role is used by API keys with fine-grained scope control (see API Key Scopes below).

Roles are assigned at the organization level and can be scoped to specific schemas or tables. An agent can hold multiple roles simultaneously — permissions are the union of all assigned roles.

API Key Scopes

API keys support 22 granular scopes for fine-grained access control:

Scope	Description
`QueryRead`	Execute SELECT queries
`QueryWrite`	Execute INSERT, UPDATE, DELETE, DDL
`SchemaRead`	List tables, describe columns
`SchemaWrite`	Create/alter/drop tables
`PolicyRead`	View policies
`PolicyWrite`	Create/update/delete policies
`AuditRead`	View audit logs
`AuditExport`	Export audit data
`BillingRead`	View billing and quota data
`BillingWrite`	Update billing settings
`UserRead`	View org users
`UserWrite`	Invite/remove users, change roles
`KeyRead`	View API keys (metadata only)
`KeyWrite`	Create/rotate/revoke API keys
`MemoryRead`	Search and retrieve agent memories
`MemoryWrite`	Store and delete agent memories
`CotRead`	Replay and verify CoT sessions
`CotWrite`	Log reasoning steps
`TriggerRead`	List semantic triggers
`TriggerManage`	Create/delete/test triggers
`BranchCreate`	Create, query, discard branches
`BranchMerge`	Merge branches to main

Pre-defined scope bundles: read_only (QueryRead, SchemaRead, MemoryRead, CotRead, TriggerRead), developer (read_only + QueryWrite, SchemaWrite, MemoryWrite, CotWrite, BranchCreate), admin (all scopes), agent_default (all query + agent feature scopes).

ABAC — Attribute-Based Access Control

ABAC evaluates fine-grained conditions against request-time attributes. Where RBAC answers "can this role access this table?", ABAC answers "can this specific request access this resource given all contextual factors?".

HatiData evaluates 10 standard ABAC attributes:

Attribute	Description	Example values
`agent_id`	The agent's logical identity	`analytics-bot`, `nightly-report`
`agent_purpose`	Declared purpose tag	`analytics`, `compliance`, `operations`
`data_classification`	Sensitivity level of requested table	`public`, `internal`, `confidential`, `restricted`
`request_time_hour`	Hour of day (0–23, UTC)	`9`, `22`
`request_day_of_week`	Day of week (0=Mon)	`0`, `5`
`source_ip_cidr`	Client IP network	`10.0.0.0/8`, `192.168.1.0/24`
`query_type`	Type of SQL statement	`SELECT`, `INSERT`, `UPDATE`, `DELETE`
`row_count_limit`	Maximum allowed result rows	`1000`, `50000`
`environment`	Deployment environment tag	`production`, `staging`, `dev`
`org_tier`	Organization's subscription tier	`free`, `cloud`, `growth`, `enterprise`

ABAC rules are expressed as conditions over these attributes. Rules are evaluated in order, and the first matching rule's action applies (permit or deny). Rules with deny actions are evaluated before permit rules regardless of order.

# Example: Restrict confidential table access to business hours, production agents only
- condition:
    data_classification: confidential
    request_time_hour:
      range: [0, 8]   # midnight to 8am UTC
  action: deny
  reason: "Confidential data not accessible outside business hours"

- condition:
    data_classification: confidential
    environment: [dev, staging]
  action: deny
  reason: "Confidential data not accessible from non-production agents"

Row-Level Security (RLS)

RLS predicates are injected into every query during the processing stage of the Query Pipeline. Each rule associates a table with a predicate that is appended to the query's WHERE clause.

RLS rules can reference the authenticated agent's claims as variables:

-- RLS rule for the "orders" table:
tenant_id = :org_id AND region IN (:allowed_regions)

At runtime, :org_id and :allowed_regions are resolved from the agent's JWT claims before injection. This means the same RLS rule enforces different filters for different organizations without requiring separate rule definitions per tenant.

RLS is enforced by the data plane — it cannot be bypassed by constructing clever SQL, because the injection happens before transpilation and execution.

Column Masking

Column masking is applied after query execution during the post-processing stage of the pipeline. Sensitive column values are replaced before results are serialized and returned to the agent.

Two masking modes are available:

Redact: The column value is replaced with the string ***REDACTED***. The agent can see that the column exists and that it had a value, but cannot see or infer the value.

Hash: The column value is replaced with a keyed cryptographic hash computed with a per-organization secret key. Hashing preserves equality semantics — agents can group by, count distinct, and join on hashed columns — but cannot reverse the hash to obtain the original value.

Column masking rules are defined in the policy bundle and are evaluated per-query based on the requesting agent's role and attributes. An agent with the admin role may see unmasked values; the same agent running with a scoped API key may see masked values for the same column.

JIT (Just-In-Time) Access

For highly sensitive operations — bulk exports, schema changes, cross-tenant queries — HatiData supports JIT access elevation. An agent requests elevated access through the control plane, specifying the operation, the tables involved, and the justification. A human approver (or an automated policy) grants or denies the request. Approved elevations are time-bounded (default: 30 minutes) and create an explicit audit record regardless of whether the elevated operation is ultimately performed.

JIT access is integrated with the hatiOS Pause & Pivot system for organizations that use both products.

Encryption

TLS 1.3 in Transit

All connections between agents and the data plane, and between the data plane and the control plane, use TLS 1.3. Older protocol versions are explicitly disabled. The Postgres wire protocol listener requires TLS — plaintext connections are rejected.

For PrivateLink deployments, TLS is maintained end-to-end even over the private backbone. The private network does not substitute for transport encryption.

Encryption at Rest

Data at rest is encrypted using AES-256-GCM. This applies to:

Query engine data files and WAL segments
Cached query results stored on local SSD (tier 2 cache)
Audit log files
Snapshot storage in object storage (tier 3 cache)

Encryption at rest is handled by the storage layer on cloud deployments (EBS encryption, Cloud Storage CMEK, Azure Disk Encryption). For local deployments, disk encryption is the operator's responsibility.

CMEK (Customer-Managed Encryption Keys)

Enterprise organizations can bring their own encryption keys managed in their own KMS:

Cloud	Key service
AWS	AWS KMS (CMK)
GCP	Cloud KMS (CMEK)
Azure	Azure Key Vault (customer-managed key)

With CMEK, HatiData's infrastructure never has access to the plaintext encryption key. The data plane holds only an encrypted data key; the master key remains in the customer's KMS and must be accessible for the data plane to start. Key rotation is handled by the KMS — HatiData re-encrypts data keys automatically when the master key is rotated.

See CMEK Configuration for setup instructions per cloud provider.

Audit

Hash-Chained Records

Every query — whether permitted, rejected, or healed — produces an audit record. Records are hash-chained: each record contains a cryptographic hash of the previous record in the same session. The chain is seeded with a cryptographically random session seed stored in the control plane at session start.

session_seed --> record_1 --> record_2 --> ... --> record_n
    |               |             |                    |
    +-- hash_0     +-- hash_1   +-- hash_2           +-- hash_n

To verify that a record has not been tampered with, you re-compute the chain from the session seed and compare each computed hash to the stored hash. Any modification to any field of any record — including the timestamp, the SQL, or the outcome — produces a hash mismatch that is detectable at the point of modification and for all subsequent records.

Tamper Detection

The control plane exposes a chain verification API (GET /v1/audit/sessions/{id}/verify) that replays and re-hashes the entire chain for a session. The response reports the chain status (valid, tampered, incomplete) and the index of the first invalid record if tampering is detected.

Organizations can export audit chains and verify them independently using the open verification specification published in the HatiData audit documentation.

Compliance Exports

Audit records can be exported in formats suitable for common compliance frameworks:

Format	Use case
NDJSON	General-purpose, machine-readable
CSV	Spreadsheet analysis, manual review
Parquet	Large-scale analytics over audit history
SIEM (CEF / JSON)	Integration with Splunk, Datadog, Elastic

Exports include the full hash chain so that the exported records can be independently verified.

Multi-Tenancy

Tenant Isolation

Each organization's data in HatiData is isolated at the schema level. Tables belonging to organization acme live in a schema namespace that is inaccessible to queries from organization globex. Schema separation is enforced at the proxy layer — agents cannot submit SQL that references schemas outside their organization's namespace, even if they somehow knew the schema names.

Cross-Tenant JOIN Prevention

Queries that attempt to JOIN across organization schemas are detected during table extraction (in the security stage of the pipeline) and rejected before any execution occurs. The error message identifies which table references caused the cross-tenant violation without revealing the schema names of other organizations.

This protection exists at the SQL level as well as the filesystem level — even if an operator misconfigured storage permissions, the proxy would still prevent cross-tenant data access.

Query Pipeline — Where each security check occurs in the pipeline
Two-Plane Model — How security responsibilities are divided between planes
Audit Guarantees — Hash-chain specification and tamper detection
CMEK Encryption — Customer-managed key setup per cloud provider
Data Residency — Where data is stored and how residency constraints are enforced
SOC 2 Architecture — Mapping security controls to SOC 2 criteria

Security Data Flow​

Stage-by-Stage Security Breakdown​

Authentication​

JWT (JSON Web Tokens)​

API Keys​

Federated Identity​

Authorization​

RBAC — Role-Based Access Control​

API Key Scopes​

ABAC — Attribute-Based Access Control​

Row-Level Security (RLS)​

Column Masking​

JIT (Just-In-Time) Access​

Encryption​

TLS 1.3 in Transit​

Encryption at Rest​

CMEK (Customer-Managed Encryption Keys)​

Audit​

Hash-Chained Records​

Tamper Detection​

Compliance Exports​

Multi-Tenancy​

Tenant Isolation​

Cross-Tenant JOIN Prevention​

Related Concepts​

Stay in the loop