Skip to main content

Security Whitepaper

Executive Summary

HatiData provides enterprise-grade data warehouse capabilities with a security-first architecture. All data processing occurs within the customer's VPC, ensuring complete data sovereignty and zero data egress. This whitepaper details the seven security pillars that underpin every HatiData deployment.


1. Network Architecture

1.1 VPC Isolation

HatiData deploys the query engine entirely inside the customer's VPC. This architecture guarantees that data never traverses the public internet and never reaches HatiData-managed infrastructure.

  • No public IP addresses on any resource (EC2 instances, NLB, VPC endpoints)
  • All communication via AWS PrivateLink between data plane and control plane
  • VPC Gateway Endpoint for S3 access (no internet gateway required)
  • Customer selects the AWS region; all compute, caching, and audit storage remain in that region
┌───────────────────── Customer VPC ──────────────────────┐
│ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ Application │───────►│ HatiData Proxy │ │
│ │ (your code) │ :5439 │ (DuckDB Engine) │ │
│ └──────────────┘ └────────┬─────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ NVMe Cache │ │
│ │ (LUKS AES-256)│ │
│ └───────┬───────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ S3 Bucket │ │
│ │ (SSE-KMS) │ │
│ └───────────────┘ │
│ │ │
│ ┌───────────────┴──────────────┐ │
│ │ AWS PrivateLink Endpoint │ │
│ └──────────────┬───────────────┘ │
└───────────────────────────────────┼──────────────────────┘

┌───────────────────────────────────┼──────────────────────┐
│ HatiData VPC │ │
│ ┌──────────────┴───────────────┐ │
│ │ Control Plane (:8080) │ │
│ │ Auth, Billing, Policy Engine │ │
│ └──────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘

1.2 Network Controls

  • Security groups with allowlist-only ingress rules
  • Port 5439 (Postgres wire protocol) restricted to customer-approved CIDR ranges
  • Port 9090 (Prometheus metrics) restricted to monitoring infrastructure
  • No internet gateway required for HatiData operation
  • AWS Shield Standard included; NLB provides built-in connection-level DDoS protection
  • Configurable concurrency limits (default: 100 concurrent queries)

2. Encryption

2.1 At Rest

All data at rest is encrypted with customer-controlled keys:

LayerAlgorithmKey Management
S3 dataAES-256 (SSE-KMS)Customer-Managed Key (CMEK) via AWS KMS
NVMe cacheAES-256-XTS (LUKS)KMS-derived key, provisioned at instance boot
Audit logsAES-256 (SSE-KMS)Separate audit KMS key
  • CMEK is required -- there is no option to use AWS-managed keys
  • Automatic annual key rotation enabled by default via AWS KMS
  • Customers can trigger manual key rotation at any time
  • NVMe LUKS partition is cryptographically erased on instance termination via ASG lifecycle hook

2.2 In Transit

  • TLS 1.3 mandatory -- no fallback to TLS 1.2, no downgrade negotiation
  • Supported cipher suites:
Cipher SuiteKey ExchangeAuthentication
TLS_AES_256_GCM_SHA384ECDHERSA / ECDSA
TLS_AES_128_GCM_SHA256ECDHERSA / ECDSA
TLS_CHACHA20_POLY1305_SHA256ECDHERSA / ECDSA
  • Internal communication between proxy components uses mTLS where applicable

3. Access Control

3.1 RBAC

Six predefined roles with least-privilege permissions:

RoleQueryManage PoliciesManage UsersBillingAudit Logs
OwnerYesYesYesYesYes
AdminYesYesYesNoYes
AnalystYesNoNoNoNo
AuditorNoNoNoNoYes
DeveloperYesNoNoNoNo
ServiceAccountYesNoNoNoNo

3.2 API Keys

  • Format: hd_live_[32 alphanumeric] (production) / hd_test_[32 alphanumeric] (staging)
  • Storage: Argon2id hash with per-key salt. Plaintext shown once at creation.
  • Scoping: 22 granular ApiScope variants bound to specific environments
  • IP allowlisting: Keys restricted to approved CIDR ranges
  • Rotation: Automatic with 72-hour grace period (old and new keys both valid)
  • Expiration alerts: Sent via configured webhooks

3.3 SSO Integration

  • WorkOS integration for enterprise SSO
  • Protocols: SAML 2.0 and OIDC
  • Supported providers: Okta, Azure AD, Google Workspace, OneLogin, PingFederate
  • MFA enforcement configurable per organization (TOTP, WebAuthn)

3.4 Federated Authentication

  • AWS STS: AssumeRoleWithWebIdentity for AWS-native workloads
  • GCP: Workload Identity Federation
  • Azure: Managed Identity + Azure AD tokens
  • Token caching with TTL-based expiration to minimize provider round-trips

4. Data Protection

4.1 Column-Level Masking

Dynamic masking applied at the proxy layer after query execution:

FunctionDescriptionExample
Full redactReplace with ***alice@example.com -> ***
Partial redactShow last N chars4111...1111 -> ***1111
Hash (SHA-256)One-way digestalice@example.com -> a1b2c3...
NullReplace with NULL555-0100 -> NULL
  • Role-based exemptions per masking rule
  • Agent-specific rules -- different masking for AI agents vs. human users
  • Underlying data is never modified

4.2 Row-Level Security

  • WHERE clause injection at the AST level (pre-execution)
  • Attribute-based filtering with session-resolved placeholders: {user_id}, {org_id}, {agent_id}, {agent_framework}, {department}, {region}
  • Applied to all query types: SELECT, UPDATE, DELETE, subqueries, CTEs
  • Cross-tenant JOIN prevention blocks queries joining across organization boundaries

4.3 PII Redaction in Audit Logs

  • Automatic detection of PII patterns in query text before audit storage
  • Patterns: email, SSN, credit card, phone number
  • Compiled regular expressions for microsecond-level scanning
  • Redaction applied to audit entries only -- query results are not affected

5. Audit & Compliance

5.1 Query Audit Trail

Every query is logged immutably:

  • Storage: Customer's S3 bucket with Object Lock (Governance mode, 7-year retention)
  • Format: JSONL partitioned by date (/audit/queries/YYYY/MM/DD/)
  • Fields captured:
    • Query ID, user, source IP
    • SQL text (PII-redacted)
    • Tables accessed, rows returned, columns masked
    • Execution time, cache hit status
    • Policy verdicts (allow/deny with reason)
    • Agent metadata (agent_id, framework)

5.2 IAM Audit Trail (Hash-Chained)

Administrative actions are recorded in a tamper-evident hash chain:

  • 27 event types: policy CRUD, key rotation, user management, SSO configuration, role changes
  • SHA-256 chain: each event includes the hash of the previous event, enabling integrity verification
  • Before/after values for change tracking
  • Chain verification available via the API (GET /v1/environments/{env_id}/audit/admin/verify-chain)

5.3 Retention

PeriodStorage TierAccess Latency
0 -- 90 daysS3 StandardMilliseconds
90 days -- 1 yearS3 GlacierMinutes
1 -- 7 yearsS3 Glacier Deep ArchiveHours

All tiers are protected by S3 Object Lock. Logs cannot be deleted or modified within the retention period.


6. Instance Security

6.1 NVMe Cache Encryption

  • LUKS full-disk encryption with AES-256-XTS
  • Key derived from the customer's KMS key at instance boot time
  • Automatic wipe on instance termination:
    1. ASG lifecycle hook triggers a Lambda function
    2. Lambda sends SSM RunCommand to execute cryptsetup luksErase
    3. Instance termination proceeds only after cryptographic erasure
  • LRU eviction at 80% capacity to prevent disk exhaustion
  • Cache is ephemeral -- rebuilds from S3 on cache miss

6.2 IAM Least Privilege

HatiData compute instances operate with minimal IAM permissions:

PermissionTargetPurpose
s3:GetObjectCustomer data bucketRead data files
s3:ListBucketCustomer data bucketList objects
s3:PutObjectAudit bucketWrite audit logs
kms:DecryptCustomer KMS keyDecrypt data at rest
  • No destructive permissions: no s3:DeleteObject, no kms:ScheduleKeyDeletion
  • Instance profile is automatically provisioned by Terraform with these exact permissions

7. Compliance Readiness

SOC 2 Type II

HatiData's architecture is designed for SOC 2 Type II compliance. All controls align with SOC 2 Trust Service Criteria:

  • Security: Encryption, RBAC, network isolation
  • Availability: Auto Scaling Groups, health checks, automated failover
  • Processing Integrity: Immutable audit trails, hash-chained verification
  • Confidentiality: Column masking, row-level security, PII redaction
  • Privacy: Data stays in customer VPC, no HatiData access to customer data

HIPAA

  • BAA template available for healthcare customers
  • PHI stays in the customer's VPC and is never accessible to HatiData staff
  • Column masking protects PHI fields in query results
  • Immutable audit trails provide accountability required by HIPAA

GDPR

  • Data residency: All processing occurs in the customer's chosen region
  • HatiData acts as a data processor per the DPA
  • Customers retain full control over data subject rights (HatiData cannot access underlying data)
  • Only anonymized usage metrics (query count, latency distributions) are collected for billing

PCI DSS

  • Column-level masking for cardholder data (PAN, CVV)
  • Row-level security restricts access to payment records by role
  • Immutable audit logs provide the accountability trail required by PCI DSS

Contact

For security inquiries, vulnerability reports, or to request the full security questionnaire:

Email: security@hatidata.com

Legal entity: Marviy Pte Ltd (Singapore, UEN: 202014065D)

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.