Security Whitepaper
Executive Summary
HatiData provides enterprise-grade data warehouse capabilities with a security-first architecture. All data processing occurs within the customer's VPC, ensuring complete data sovereignty and zero data egress. This whitepaper details the seven security pillars that underpin every HatiData deployment.
1. Network Architecture
1.1 VPC Isolation
HatiData deploys the query engine entirely inside the customer's VPC. This architecture guarantees that data never traverses the public internet and never reaches HatiData-managed infrastructure.
- No public IP addresses on any resource (EC2 instances, NLB, VPC endpoints)
- All communication via AWS PrivateLink between data plane and control plane
- VPC Gateway Endpoint for S3 access (no internet gateway required)
- Customer selects the AWS region; all compute, caching, and audit storage remain in that region
┌───────────────────── Customer VPC ──────────────────────┐
│ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ Application │───────►│ HatiData Proxy │ │
│ │ (your code) │ :5439 │ (DuckDB Engine) │ │
│ └──────────────┘ └────────┬─────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ NVMe Cache │ │
│ │ (LUKS AES-256)│ │
│ └───────┬───────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ S3 Bucket │ │
│ │ (SSE-KMS) │ │
│ └───────────────┘ │
│ │ │
│ ┌───────────────┴──────────────┐ │
│ │ AWS PrivateLink Endpoint │ │
│ └──────────────┬───────────────┘ │
└───────────────────────────────────┼──────────────────────┘
│
┌───────────────────────────────────┼──────────────────────┐
│ HatiData VPC │ │
│ ┌──────────────┴───────────────┐ │
│ │ Control Plane (:8080) │ │
│ │ Auth, Billing, Policy Engine │ │
│ └──────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
1.2 Network Controls
- Security groups with allowlist-only ingress rules
- Port 5439 (Postgres wire protocol) restricted to customer-approved CIDR ranges
- Port 9090 (Prometheus metrics) restricted to monitoring infrastructure
- No internet gateway required for HatiData operation
- AWS Shield Standard included; NLB provides built-in connection-level DDoS protection
- Configurable concurrency limits (default: 100 concurrent queries)
2. Encryption
2.1 At Rest
All data at rest is encrypted with customer-controlled keys:
| Layer | Algorithm | Key Management |
|---|---|---|
| S3 data | AES-256 (SSE-KMS) | Customer-Managed Key (CMEK) via AWS KMS |
| NVMe cache | AES-256-XTS (LUKS) | KMS-derived key, provisioned at instance boot |
| Audit logs | AES-256 (SSE-KMS) | Separate audit KMS key |
- CMEK is required -- there is no option to use AWS-managed keys
- Automatic annual key rotation enabled by default via AWS KMS
- Customers can trigger manual key rotation at any time
- NVMe LUKS partition is cryptographically erased on instance termination via ASG lifecycle hook
2.2 In Transit
- TLS 1.3 mandatory -- no fallback to TLS 1.2, no downgrade negotiation
- Supported cipher suites:
| Cipher Suite | Key Exchange | Authentication |
|---|---|---|
TLS_AES_256_GCM_SHA384 | ECDHE | RSA / ECDSA |
TLS_AES_128_GCM_SHA256 | ECDHE | RSA / ECDSA |
TLS_CHACHA20_POLY1305_SHA256 | ECDHE | RSA / ECDSA |
- Internal communication between proxy components uses mTLS where applicable
3. Access Control
3.1 RBAC
Six predefined roles with least-privilege permissions:
| Role | Query | Manage Policies | Manage Users | Billing | Audit Logs |
|---|---|---|---|---|---|
| Owner | Yes | Yes | Yes | Yes | Yes |
| Admin | Yes | Yes | Yes | No | Yes |
| Analyst | Yes | No | No | No | No |
| Auditor | No | No | No | No | Yes |
| Developer | Yes | No | No | No | No |
| ServiceAccount | Yes | No | No | No | No |
3.2 API Keys
- Format:
hd_live_[32 alphanumeric](production) /hd_test_[32 alphanumeric](staging) - Storage: Argon2id hash with per-key salt. Plaintext shown once at creation.
- Scoping: 22 granular
ApiScopevariants bound to specific environments - IP allowlisting: Keys restricted to approved CIDR ranges
- Rotation: Automatic with 72-hour grace period (old and new keys both valid)
- Expiration alerts: Sent via configured webhooks
3.3 SSO Integration
- WorkOS integration for enterprise SSO
- Protocols: SAML 2.0 and OIDC
- Supported providers: Okta, Azure AD, Google Workspace, OneLogin, PingFederate
- MFA enforcement configurable per organization (TOTP, WebAuthn)
3.4 Federated Authentication
- AWS STS:
AssumeRoleWithWebIdentityfor AWS-native workloads - GCP: Workload Identity Federation
- Azure: Managed Identity + Azure AD tokens
- Token caching with TTL-based expiration to minimize provider round-trips
4. Data Protection
4.1 Column-Level Masking
Dynamic masking applied at the proxy layer after query execution:
| Function | Description | Example |
|---|---|---|
| Full redact | Replace with *** | alice@example.com -> *** |
| Partial redact | Show last N chars | 4111...1111 -> ***1111 |
| Hash (SHA-256) | One-way digest | alice@example.com -> a1b2c3... |
| Null | Replace with NULL | 555-0100 -> NULL |
- Role-based exemptions per masking rule
- Agent-specific rules -- different masking for AI agents vs. human users
- Underlying data is never modified
4.2 Row-Level Security
- WHERE clause injection at the AST level (pre-execution)
- Attribute-based filtering with session-resolved placeholders:
{user_id},{org_id},{agent_id},{agent_framework},{department},{region} - Applied to all query types: SELECT, UPDATE, DELETE, subqueries, CTEs
- Cross-tenant JOIN prevention blocks queries joining across organization boundaries
4.3 PII Redaction in Audit Logs
- Automatic detection of PII patterns in query text before audit storage
- Patterns: email, SSN, credit card, phone number
- Compiled regular expressions for microsecond-level scanning
- Redaction applied to audit entries only -- query results are not affected
5. Audit & Compliance
5.1 Query Audit Trail
Every query is logged immutably:
- Storage: Customer's S3 bucket with Object Lock (Governance mode, 7-year retention)
- Format: JSONL partitioned by date (
/audit/queries/YYYY/MM/DD/) - Fields captured:
- Query ID, user, source IP
- SQL text (PII-redacted)
- Tables accessed, rows returned, columns masked
- Execution time, cache hit status
- Policy verdicts (allow/deny with reason)
- Agent metadata (agent_id, framework)
5.2 IAM Audit Trail (Hash-Chained)
Administrative actions are recorded in a tamper-evident hash chain:
- 27 event types: policy CRUD, key rotation, user management, SSO configuration, role changes
- SHA-256 chain: each event includes the hash of the previous event, enabling integrity verification
- Before/after values for change tracking
- Chain verification available via the API (
GET /v1/environments/{env_id}/audit/admin/verify-chain)
5.3 Retention
| Period | Storage Tier | Access Latency |
|---|---|---|
| 0 -- 90 days | S3 Standard | Milliseconds |
| 90 days -- 1 year | S3 Glacier | Minutes |
| 1 -- 7 years | S3 Glacier Deep Archive | Hours |
All tiers are protected by S3 Object Lock. Logs cannot be deleted or modified within the retention period.
6. Instance Security
6.1 NVMe Cache Encryption
- LUKS full-disk encryption with AES-256-XTS
- Key derived from the customer's KMS key at instance boot time
- Automatic wipe on instance termination:
- ASG lifecycle hook triggers a Lambda function
- Lambda sends SSM RunCommand to execute
cryptsetup luksErase - Instance termination proceeds only after cryptographic erasure
- LRU eviction at 80% capacity to prevent disk exhaustion
- Cache is ephemeral -- rebuilds from S3 on cache miss
6.2 IAM Least Privilege
HatiData compute instances operate with minimal IAM permissions:
| Permission | Target | Purpose |
|---|---|---|
s3:GetObject | Customer data bucket | Read data files |
s3:ListBucket | Customer data bucket | List objects |
s3:PutObject | Audit bucket | Write audit logs |
kms:Decrypt | Customer KMS key | Decrypt data at rest |
- No destructive permissions: no
s3:DeleteObject, nokms:ScheduleKeyDeletion - Instance profile is automatically provisioned by Terraform with these exact permissions
7. Compliance Readiness
SOC 2 Type II
HatiData's architecture is designed for SOC 2 Type II compliance. All controls align with SOC 2 Trust Service Criteria:
- Security: Encryption, RBAC, network isolation
- Availability: Auto Scaling Groups, health checks, automated failover
- Processing Integrity: Immutable audit trails, hash-chained verification
- Confidentiality: Column masking, row-level security, PII redaction
- Privacy: Data stays in customer VPC, no HatiData access to customer data
HIPAA
- BAA template available for healthcare customers
- PHI stays in the customer's VPC and is never accessible to HatiData staff
- Column masking protects PHI fields in query results
- Immutable audit trails provide accountability required by HIPAA
GDPR
- Data residency: All processing occurs in the customer's chosen region
- HatiData acts as a data processor per the DPA
- Customers retain full control over data subject rights (HatiData cannot access underlying data)
- Only anonymized usage metrics (query count, latency distributions) are collected for billing
PCI DSS
- Column-level masking for cardholder data (PAN, CVV)
- Row-level security restricts access to payment records by role
- Immutable audit logs provide the accountability trail required by PCI DSS
Contact
For security inquiries, vulnerability reports, or to request the full security questionnaire:
Email: security@hatidata.com
Legal entity: Marviy Pte Ltd (Singapore, UEN: 202014065D)