PrivateLink & VPC
Enterprise deployments run HatiData's entire data plane inside your VPC. Your agents connect to the query proxy over AWS PrivateLink (or equivalent GCP / Azure private networking) without any traffic traversing the public internet. This page covers VPC architecture, Shield binary protection, Shadow Mode validation, and Terraform deployment.
VPC Architecture
┌──────────────────────────────────────────────────────────────────┐
│ Customer VPC (your AWS account, eu-west-1) │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ Agent Workload │ │ HatiData Data Plane │ │
│ │ (ECS / EKS / │ │ │ │
│ │ Lambda / EC2) │ │ ┌──────────────────────┐ │ │
│ │ │ │ │ Network Load Balancer│ │ │
│ │ psql / SDK │◀──────▶│ │ (TLS 1.3, port 5439)│ │ │
│ │ port 5439 │ │ └──────────┬───────────┘ │ │
│ └──────────────────┘ │ │ │ │
│ ▲ │ ┌──────────▼───────────┐ │ │
│ │ PrivateLink │ │ HatiData proxy │ │ │
│ │ (VPC Endpoint) │ │ (query engine, │ │ │
│ │ │ │ SSD cache LUKS) │ │ │
│ ┌───────┴────────┐ │ └──────────┬───────────┘ │ │
│ │ VPC Endpoint │ │ │ │ │
│ │ Service │ │ ┌──────────▼───────────┐ │ │
│ └────────────────┘ │ │ S3 VPC Endpoint │ │ │
│ │ │ (object storage) │ │ │
│ │ └──────────────────────┘ │ │
│ └──────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Security Group: proxy-sg │ │
│ │ Inbound: 5439/tcp from agent_sg only │ │
│ │ Inbound: 9090/tcp from monitoring_sg only │ │
│ │ Outbound: 443/tcp to S3 VPC endpoint only │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
▲
│ Control plane (metadata only, no data)
│ HTTPS to api.hatidata.com
▼
┌─────────────────────────────┐
│ HatiData Control Plane │
│ (HatiData-managed cloud) │
│ Auth, billing, policies │
│ No access to customer data │
└─────────────────────────────┘
Key properties:
- No public IP addresses on any proxy instance, NLB target, or S3 VPC endpoint
- Agent workloads connect over PrivateLink — traffic never traverses the internet
- The control plane receives only anonymized billing metrics, never query content
- S3 access is restricted to the VPC endpoint (no internet-routed S3 access)
PrivateLink Connectivity
AWS PrivateLink Setup
HatiData provisions an NLB-backed VPC Endpoint Service in your VPC. Your agent workloads connect via a VPC Interface Endpoint:
# Automatically provisioned by the HatiData Terraform module:
resource "aws_vpc_endpoint_service" "hatidata" {
acceptance_required = false
network_load_balancer_arns = [aws_lb.hatidata_proxy.arn]
allowed_principals = [var.agent_account_arn]
}
resource "aws_vpc_endpoint" "hatidata_consumer" {
# Created in the agent workload account / VPC
service_name = aws_vpc_endpoint_service.hatidata.service_name
vpc_endpoint_type = "Interface"
vpc_id = var.agent_vpc_id
subnet_ids = var.agent_private_subnet_ids
security_group_ids = [aws_security_group.hatidata_endpoint_sg.id]
}
Connection string for agents using the PrivateLink endpoint:
postgresql://hd_live_...@vpce-0abc123.hatidata.vpce-svc-xyz.eu-west-1.vpce.amazonaws.com:5439/hatidata?sslmode=require
GCP Private Service Connect
For GCP deployments, HatiData uses Private Service Connect (PSC) instead of PrivateLink. The proxy is deployed on Cloud Run or GKE with a PSC endpoint exposed to the customer's VPC.
Azure Private Link
For Azure deployments, HatiData uses Azure Private Link Service backed by an Internal Load Balancer. The proxy is deployed on AKS with a private endpoint in the customer's VNet.
Shield: Binary Protection
Shield is HatiData's four-layer protection system for the proxy binary. It prevents unauthorized copying or execution of the Enterprise proxy outside a licensed environment.
Layer 1: Black-Box Binary
The Enterprise proxy binary is distributed as a pre-compiled, stripped native binary with no debug symbols. Source code is not distributed. The binary implements obfuscated license validation that cannot be bypassed by binary patching without triggering the integrity check.
Layer 2: Heartbeat License Validation
The proxy sends a heartbeat to the HatiData license server every 15 minutes. The heartbeat includes:
- Organization ID (from provisioning)
- Environment fingerprint (see Layer 4)
- Binary hash (cryptographic hash of the proxy binary)
If the heartbeat fails validation (revoked license, wrong environment, tampered binary), the proxy enters a graceful degradation mode: it completes in-flight queries and then stops accepting new connections. A grace period of 4 hours is provided before full shutdown, allowing time to investigate connectivity issues.
Layer 3: Locked Instance
The proxy binary is locked to the provisioned instance at boot time using instance metadata:
Lock material = sha256(
org_id + # provisioned organization
instance_id + # AWS EC2 / GCP instance / Azure VM ID
account_id + # cloud account where instance runs
region + # must match provisioned region
launch_time # set at first boot; cannot be replayed
)
A proxy binary that boots outside the provisioned account or region will fail the lock check and refuse to start.
Layer 4: Environment Binding
The proxy validates that the runtime environment matches the Terraform-provisioned configuration:
- VPC ID must match the provisioned value
- Security group ID must match
- KMS key ARN must match the CMEK configuration
- Subnet IDs must be within the provisioned region
This prevents copying a running instance's LUKS-unlocked state to a different environment.
Shadow Mode
Shadow Mode allows you to validate HatiData against your existing data infrastructure before committing to full cutover. In Shadow Mode, all queries are routed to both your existing infrastructure and HatiData in parallel — results are compared, and a report is generated. No data is modified.
How Shadow Mode Works
Agent Query
│
├──▶ Existing Infrastructure ──▶ Result A ──┐
│ ├──▶ Compare ──▶ Report
└──▶ HatiData Proxy (shadow) ──▶ Result B ──┘
Step 1: Upload Shadow Query Set
# Upload a set of representative queries for shadow testing
curl -X POST https://api.hatidata.com/v1/organizations/org_abc/shadow-mode/upload \
-H "Authorization: Bearer <jwt>" \
-F "queries=@representative-queries.sql"
The query file should include a representative sample of your production query workload: analytical queries, agent memory lookups, compliance queries, and any queries touching sensitive columns.
Step 2: Run Shadow Replay
# Execute shadow replay against your existing infrastructure
curl -X POST https://api.hatidata.com/v1/organizations/org_abc/shadow-mode/replay \
-H "Authorization: Bearer <jwt>" \
-H "Content-Type: application/json" \
-d '{
"comparison_dsn": "postgresql://user:pass@your-existing-host:5432/db",
"query_set_id": "qs_a1b2c3",
"sample_rate": 1.0,
"timeout_seconds": 300
}'
Step 3: Review Comparison Report
curl https://api.hatidata.com/v1/organizations/org_abc/shadow-mode/report/rpt_xyz \
-H "Authorization: Bearer <jwt>"
{
"report_id": "rpt_xyz",
"query_count": 1500,
"exact_match": 1487,
"semantic_match": 11,
"mismatch": 2,
"error": 0,
"hatidata_p50_ms": 18,
"hatidata_p99_ms": 94,
"existing_p50_ms": 210,
"existing_p99_ms": 1840,
"mismatch_details": [
{
"query": "SELECT ...",
"hatidata_row_count": 1842,
"existing_row_count": 1843,
"delta": "1 row difference — row added between shadow runs"
}
]
}
Shadow Mode runs are completely read-only. No writes, schema changes, or audit events are generated on your existing infrastructure.
Terraform Deployment
HatiData provides Terraform modules for AWS, GCP, and Azure. All modules follow the same variable interface for cross-cloud consistency.
AWS Modules (8 modules)
| Module | Purpose |
|---|---|
terraform/aws/modules/vpc | VPC, subnets, route tables, NAT gateway |
terraform/aws/modules/proxy | EC2 Auto Scaling Group, NLB, launch template |
terraform/aws/modules/storage | S3 bucket with Object Lock, lifecycle rules, encryption |
terraform/aws/modules/kms | KMS key with rotation policy and proxy IAM role |
terraform/aws/modules/iam | IAM roles and instance profiles |
terraform/aws/modules/nlb | Network Load Balancer and target group configuration |
terraform/aws/modules/privatelink | VPC Endpoint Service and consumer endpoint |
terraform/aws/modules/monitoring | CloudWatch dashboards, alarms, Prometheus scrape config |
GCP Modules (11 modules)
| Module | Purpose |
|---|---|
terraform/gcp/modules/vpc | VPC network and subnets |
terraform/gcp/modules/gke | GKE Standard cluster (high-performance SSD node pool) |
terraform/gcp/modules/cloud_run | Cloud Run service for control plane |
terraform/gcp/modules/cloudsql | Cloud SQL for control plane metadata |
terraform/gcp/modules/secrets | Secret Manager for configuration |
terraform/gcp/modules/registry | Artifact Registry for proxy images |
terraform/gcp/modules/kms | Cloud KMS key ring and crypto key |
terraform/gcp/modules/storage | GCS bucket with retention policy |
terraform/gcp/modules/iam | Service accounts and IAM bindings |
terraform/gcp/modules/psc | Private Service Connect endpoint |
terraform/gcp/modules/monitoring | Cloud Monitoring dashboards and alerting |
Azure Modules (6 modules)
| Module | Purpose |
|---|---|
terraform/azure/modules/vnet | Virtual Network, subnets, NSGs |
terraform/azure/modules/aks | AKS cluster for proxy workload |
terraform/azure/modules/storage | Azure Blob Storage with immutability policy |
terraform/azure/modules/keyvault | Azure Key Vault for CMEK |
terraform/azure/modules/privatelink | Private Link Service and endpoint |
terraform/azure/modules/monitoring | Azure Monitor and Log Analytics |
Deployment Example (AWS)
cd terraform/aws
# Initialize
terraform init -backend-config="bucket=my-terraform-state" \
-backend-config="key=hatidata/production" \
-backend-config="region=eu-west-1"
# Plan with production variables
terraform plan -var-file="environments/production.tfvars"
# Apply
terraform apply -var-file="environments/production.tfvars"
Compute Tiers
| Tier | Instance Type | vCPU | RAM | SSD Cache | Max Concurrent Queries |
|---|---|---|---|---|---|
| Starter | r6id.xlarge | 4 | 32 GB | 237 GB | 25 |
| Standard | r6id.2xlarge | 8 | 64 GB | 474 GB | 50 |
| Performance | r6id.4xlarge | 16 | 128 GB | 950 GB | 100 |
| High Memory | r6id.8xlarge | 32 | 256 GB | 1.9 TB | 200 |
| Custom | On request | — | — | — | — |
Instance type selection is configurable in terraform/aws/environments/*.tfvars. Spot instances are available for dev and staging environments at approximately 70% cost reduction (devbox_spot = true).
SLA
| Tier | Availability SLA | P1 Response | P1 Resolution |
|---|---|---|---|
| Cloud | 99.5% | 30 min | 8 hr |
| Growth | 99.9% | 15 min | 4 hr |
| Enterprise | 99.99% | 15 min | 4 hr |
Enterprise SLA includes a dedicated support Slack channel, quarterly architecture reviews, and a named customer success engineer.
Migrating from Cloud to Enterprise
Cloud tier customers use HatiData's managed multi-tenant proxy. Enterprise tier customers run the proxy in their own VPC. The migration path:
- Provision Enterprise environment: Run Terraform in your AWS/GCP/Azure account. This provisions VPC, NLB, proxy ASG, S3 bucket, and KMS key.
- Migrate data: Copy your Parquet files from the managed S3 bucket to your new Enterprise S3 bucket using standard
aws s3 syncor equivalent. - Update connection strings: Point agents and pipelines to the new PrivateLink endpoint (
vpce-...). - Run Shadow Mode: Execute a shadow comparison with 100% sample rate to verify result parity.
- Cut over: Update DNS or connection configuration in your agent framework. The Cloud and Enterprise environments can run in parallel during the transition.
- Decommission Cloud environment: Cancel the Cloud tier subscription after cut-over validation.
Data migration assistance is available from the HatiData solutions engineering team for Growth and Enterprise customers.
Related Concepts
- CMEK & Encryption — Encryption within the VPC deployment
- Audit Guarantees — Audit log storage in the Enterprise VPC
- Data Residency — Region selection for Enterprise deployments
- SOC 2 Architecture — Compliance controls enabled by VPC isolation
- Security Model — Full security architecture reference