Security Overview
HatiData is built with a security-first architecture. All data processing occurs within the customer's VPC, ensuring complete data sovereignty and zero egress. This page provides a comprehensive overview of every security layer in HatiData.
Architecture Principle: Data Never Leaves Your VPC
Unlike traditional cloud warehouses that process data on shared infrastructure, HatiData deploys the query engine inside your VPC. The control plane communicates over AWS PrivateLink, and no public IP addresses are assigned to any resource.
Your VPC HatiData VPC
┌──────────────────────┐ ┌──────────────────┐
│ HatiData Proxy │ Private │ Control Plane │
│ (:5439) │◄── Link ───► │ (:8080) │
│ │ │ │
│ NVMe Cache (LUKS) │ │ Auth / Billing │
│ DuckDB Engine │ │ Policy Engine │
│ │ │ Audit Storage │
│ S3 (SSE-KMS) │ └──────────────────┘
└──────────────────────┘
Encryption
At Rest
- Customer-Managed Encryption Keys (CMEK) via AWS KMS / GCP Cloud KMS / Azure Key Vault
- NVMe SSD cache encrypted with LUKS (AES-256-XTS), key derived from customer KMS at boot
- S3 / GCS / Azure Blob server-side encryption with KMS-managed keys (SSE-KMS)
- Automatic key rotation enabled by default
In Transit
- TLS 1.3 mandatory -- no fallback to TLS 1.2
- Supported cipher suites:
TLS_AES_256_GCM_SHA384TLS_AES_128_GCM_SHA256TLS_CHACHA20_POLY1305_SHA256
RBAC (Role-Based Access Control)
Six predefined roles with least-privilege permissions:
| Role | Query | Manage Policies | Manage Users | Billing | Audit Logs |
|---|---|---|---|---|---|
| Owner | Yes | Yes | Yes | Yes | Yes |
| Admin | Yes | Yes | Yes | No | Yes |
| Analyst | Yes | No | No | No | No |
| Auditor | No | No | No | No | Yes |
| Developer | Yes | No | No | No | No |
| ServiceAccount | Yes | No | No | No | No |
For detailed role definitions and permission matrices, see Authorization.
ABAC (Attribute-Based Access Control)
HatiData evaluates policies against a rich evaluation context built from the session, request, and environment:
Evaluation Context Attributes
| Attribute | Type | Description |
|---|---|---|
user_role | Role | RBAC role of the requester |
user_id | UUID | Authenticated user identity |
org_id | UUID | Organization scope |
environment | String | Target environment (production, staging, dev) |
source_ip | IpAddr | Client IP address |
query_origin | String | Origin type (dashboard, api, agent, sdk) |
agent_framework | Option<String> | Agent framework (langchain, crewai, custom) |
agent_id | Option<String> | Unique agent identifier |
time_of_day | NaiveTime | Current server time |
day_of_week | Weekday | Current day |
Rule Conditions
| Condition | Example |
|---|---|
QueryOriginIs | Block queries not from dashboard or sdk |
AgentFrameworkIs | Allow only langchain agents |
TimeOfDay | Deny queries outside business hours |
DayOfWeek | Read-only on weekends |
AttributeEquals | Match custom key-value attributes |
LicenseTierIs | Gate features by tier (Free, Cloud, Growth, Enterprise) |
ScopeRequired | Require specific API key scope |
For full details, see Authorization.
API Key Scopes
API keys use the format hd_live_[32 alphanumeric] (production) and hd_test_[32 alphanumeric] (staging). Keys are hashed with Argon2id before storage -- the plaintext is shown only once at creation time.
22 granular ApiScope variants are organized into scope bundles:
| Bundle | Included Scopes |
|---|---|
| ReadOnly | query:read, schema:read, audit:read |
| Developer | ReadOnly + query:write, schema:write, environment:read |
| Admin | Developer + policy:*, user:*, key:*, webhook:*, billing:read |
| Agent | query:read, query:write, schema:read, agent:* |
Keys support IP allowlisting and automatic rotation with a 72-hour grace period.
For details on key management, see API Keys.
Audit Logging
Every query and administrative action is logged to an immutable audit trail.
Query Audit
Each query audit entry captures:
- Query ID, user, source IP
- SQL text (PII-redacted)
- Tables accessed, rows returned, columns masked
- Execution time, cache hit status
- Policy verdicts (allow/deny with reason)
- Agent metadata (agent_id, framework)
Logs are stored in the customer's object storage bucket with Object Lock (7-year retention, Governance mode). Format: JSONL partitioned by date.
IAM Audit (Hash-Chained)
Administrative actions are recorded in a tamper-evident hash chain:
- 27 event types covering policy CRUD, key rotation, user management, SSO configuration
- SHA-256 chain verification -- each event references the hash of the previous event
- Before/after values for change tracking
- Chain integrity can be verified via the Audit API
Retention
| Tier | Hot Storage | Glacier | Deep Archive |
|---|---|---|---|
| Duration | 90 days | 90 days -- 1 year | 1 -- 7 years |
Row-Level Security
Row-level security (RLS) injects WHERE clauses at the SQL AST level before execution. Filters support agent-aware placeholders:
-- Policy: agents can only see their own data
WHERE agent_id = '{agent_id}'
-- Policy: department-scoped access
WHERE department = '{department}'
-- Policy: organization isolation
WHERE org_id = '{org_id}'
Placeholders are resolved from the authenticated session context. See Data Protection for details.
Column Masking
Dynamic column masking is applied at the proxy layer after query execution. Masking functions:
| Function | Output | Example |
|---|---|---|
| Full | *** | alice@example.com -> *** |
| Partial | Last N chars visible | 4111111111111111 -> ***1111 |
| Hash | SHA-256 digest | alice@example.com -> a1b2c3... |
| Null | NULL | 555-0100 -> NULL |
Masking rules are role-based with agent-specific overrides. The underlying data is never modified.
JIT Access
Just-In-Time access provides time-bounded privilege escalation:
- For humans: Temporary role elevation with configurable duration and automatic revocation
- For agents: Structured
AgentCapabilityGrantwith table allowlists, query count limits, and expiration
All JIT grants are recorded in the IAM audit trail with full before/after tracking.
Tenant Isolation
Multi-tenant deployments enforce strict isolation:
- Automatic
WHERE org_id = '{org_id}'injection on every query - Cross-tenant JOIN prevention at the AST level
- Parent-child organization hierarchy support
- Per-tenant resource quotas
Federated Authentication
HatiData supports federation with cloud identity providers:
| Provider | Method |
|---|---|
| AWS | STS AssumeRoleWithWebIdentity |
| GCP | Workload Identity Federation |
| Azure | Managed Identity + Azure AD |
Federation tokens are cached in a DashMap with TTL-based expiration.
Next Steps
- Authentication -- JWT, API keys, federated tokens, SSO
- Authorization -- RBAC roles, ABAC policies, policy simulation
- Data Protection -- Column masking, RLS, PII redaction
- Security Whitepaper -- Full architecture deep dive
- Compliance & CISO FAQ -- SOC 2, HIPAA, GDPR, PCI DSS