Security Overview

HatiData is built with a security-first architecture. All data processing occurs within the customer's VPC, ensuring complete data sovereignty and zero egress. This page provides a comprehensive overview of every security layer in HatiData.

Architecture Principle: Data Never Leaves Your VPC

Unlike traditional cloud warehouses that process data on shared infrastructure, HatiData deploys the query engine inside your VPC. The control plane communicates over AWS PrivateLink, and no public IP addresses are assigned to any resource.

Your VPC                              HatiData VPC
┌──────────────────────┐              ┌──────────────────┐
│  HatiData Proxy      │   Private    │  Control Plane   │
│  (:5439)             │◄── Link ───► │  (:8080)         │
│                      │              │                  │
│  NVMe Cache (LUKS)   │              │  Auth / Billing  │
│  DuckDB Engine       │              │  Policy Engine   │
│                      │              │  Audit Storage   │
│  S3 (SSE-KMS)        │              └──────────────────┘
└──────────────────────┘

Encryption

At Rest

Customer-Managed Encryption Keys (CMEK) via AWS KMS / GCP Cloud KMS / Azure Key Vault
NVMe SSD cache encrypted with LUKS (AES-256-XTS), key derived from customer KMS at boot
S3 / GCS / Azure Blob server-side encryption with KMS-managed keys (SSE-KMS)
Automatic key rotation enabled by default

In Transit

TLS 1.3 mandatory -- no fallback to TLS 1.2
Supported cipher suites:
- TLS_AES_256_GCM_SHA384
- TLS_AES_128_GCM_SHA256
- TLS_CHACHA20_POLY1305_SHA256

RBAC (Role-Based Access Control)

Six predefined roles with least-privilege permissions:

Role	Query	Manage Policies	Manage Users	Billing	Audit Logs
Owner	Yes	Yes	Yes	Yes	Yes
Admin	Yes	Yes	Yes	No	Yes
Analyst	Yes	No	No	No	No
Auditor	No	No	No	No	Yes
Developer	Yes	No	No	No	No
ServiceAccount	Yes	No	No	No	No

For detailed role definitions and permission matrices, see Authorization.

ABAC (Attribute-Based Access Control)

HatiData evaluates policies against a rich evaluation context built from the session, request, and environment:

Evaluation Context Attributes

Attribute	Type	Description
`user_role`	`Role`	RBAC role of the requester
`user_id`	`UUID`	Authenticated user identity
`org_id`	`UUID`	Organization scope
`environment`	`String`	Target environment (production, staging, dev)
`source_ip`	`IpAddr`	Client IP address
`query_origin`	`String`	Origin type (dashboard, api, agent, sdk)
`agent_framework`	`Option<String>`	Agent framework (langchain, crewai, custom)
`agent_id`	`Option<String>`	Unique agent identifier
`time_of_day`	`NaiveTime`	Current server time
`day_of_week`	`Weekday`	Current day

Rule Conditions

Condition	Example
`QueryOriginIs`	Block queries not from `dashboard` or `sdk`
`AgentFrameworkIs`	Allow only `langchain` agents
`TimeOfDay`	Deny queries outside business hours
`DayOfWeek`	Read-only on weekends
`AttributeEquals`	Match custom key-value attributes
`LicenseTierIs`	Gate features by tier (Free, Cloud, Growth, Enterprise)
`ScopeRequired`	Require specific API key scope

For full details, see Authorization.

API Key Scopes

API keys use the format hd_live_[32 alphanumeric] (production) and hd_test_[32 alphanumeric] (staging). Keys are hashed with Argon2id before storage -- the plaintext is shown only once at creation time.

22 granular ApiScope variants are organized into scope bundles:

Bundle	Included Scopes
ReadOnly	`query:read`, `schema:read`, `audit:read`
Developer	ReadOnly + `query:write`, `schema:write`, `environment:read`
Admin	Developer + `policy:`, `user:`, `key:`, `webhook:`, `billing:read`
Agent	`query:read`, `query:write`, `schema:read`, `agent:*`

Keys support IP allowlisting and automatic rotation with a 72-hour grace period.

For details on key management, see API Keys.

Audit Logging

Every query and administrative action is logged to an immutable audit trail.

Query Audit

Each query audit entry captures:

Query ID, user, source IP
SQL text (PII-redacted)
Tables accessed, rows returned, columns masked
Execution time, cache hit status
Policy verdicts (allow/deny with reason)
Agent metadata (agent_id, framework)

Logs are stored in the customer's object storage bucket with Object Lock (7-year retention, Governance mode). Format: JSONL partitioned by date.

IAM Audit (Hash-Chained)

Administrative actions are recorded in a tamper-evident hash chain:

27 event types covering policy CRUD, key rotation, user management, SSO configuration
SHA-256 chain verification -- each event references the hash of the previous event
Before/after values for change tracking
Chain integrity can be verified via the Audit API

Retention

Tier	Hot Storage	Glacier	Deep Archive
Duration	90 days	90 days -- 1 year	1 -- 7 years

Row-Level Security

Row-level security (RLS) injects WHERE clauses at the SQL AST level before execution. Filters support agent-aware placeholders:

-- Policy: agents can only see their own data
WHERE agent_id = '{agent_id}'

-- Policy: department-scoped access
WHERE department = '{department}'

-- Policy: organization isolation
WHERE org_id = '{org_id}'

Placeholders are resolved from the authenticated session context. See Data Protection for details.

Column Masking

Dynamic column masking is applied at the proxy layer after query execution. Masking functions:

Function	Output	Example
Full	`***`	`alice@example.com` -> `***`
Partial	Last N chars visible	`4111111111111111` -> `***1111`
Hash	SHA-256 digest	`alice@example.com` -> `a1b2c3...`
Null	`NULL`	`555-0100` -> `NULL`

Masking rules are role-based with agent-specific overrides. The underlying data is never modified.

JIT Access

Just-In-Time access provides time-bounded privilege escalation:

For humans: Temporary role elevation with configurable duration and automatic revocation
For agents: Structured AgentCapabilityGrant with table allowlists, query count limits, and expiration

All JIT grants are recorded in the IAM audit trail with full before/after tracking.

Tenant Isolation

Multi-tenant deployments enforce strict isolation:

Automatic WHERE org_id = '{org_id}' injection on every query
Cross-tenant JOIN prevention at the AST level
Parent-child organization hierarchy support
Per-tenant resource quotas

Federated Authentication

HatiData supports federation with cloud identity providers:

Provider	Method
AWS	STS `AssumeRoleWithWebIdentity`
GCP	Workload Identity Federation
Azure	Managed Identity + Azure AD

Federation tokens are cached in a DashMap with TTL-based expiration.

Next Steps

Authentication -- JWT, API keys, federated tokens, SSO
Authorization -- RBAC roles, ABAC policies, policy simulation
Data Protection -- Column masking, RLS, PII redaction
Security Whitepaper -- Full architecture deep dive
Compliance & CISO FAQ -- SOC 2, HIPAA, GDPR, PCI DSS

Architecture Principle: Data Never Leaves Your VPC​

Encryption​

At Rest​

In Transit​

RBAC (Role-Based Access Control)​

ABAC (Attribute-Based Access Control)​

Evaluation Context Attributes​

Rule Conditions​

API Key Scopes​

Audit Logging​

Query Audit​

IAM Audit (Hash-Chained)​

Retention​

Row-Level Security​

Column Masking​

JIT Access​

Tenant Isolation​

Federated Authentication​

Next Steps​

Stay in the loop