Skip to main content

PrivateLink & VPC

Enterprise deployments run HatiData's entire data plane inside your VPC. Your agents connect to the query proxy over AWS PrivateLink (or equivalent GCP / Azure private networking) without any traffic traversing the public internet. This page covers VPC architecture, Shield binary protection, Shadow Mode validation, and Terraform deployment.


VPC Architecture

┌──────────────────────────────────────────────────────────────────┐
│ Customer VPC (your AWS account, eu-west-1) │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ Agent Workload │ │ HatiData Data Plane │ │
│ │ (ECS / EKS / │ │ │ │
│ │ Lambda / EC2) │ │ ┌──────────────────────┐ │ │
│ │ │ │ │ Network Load Balancer│ │ │
│ │ psql / SDK │◀──────▶│ │ (TLS 1.3, port 5439)│ │ │
│ │ port 5439 │ │ └──────────┬───────────┘ │ │
│ └──────────────────┘ │ │ │ │
│ ▲ │ ┌──────────▼───────────┐ │ │
│ │ PrivateLink │ │ HatiData proxy │ │ │
│ │ (VPC Endpoint) │ │ (query engine, │ │ │
│ │ │ │ SSD cache LUKS) │ │ │
│ ┌───────┴────────┐ │ └──────────┬───────────┘ │ │
│ │ VPC Endpoint │ │ │ │ │
│ │ Service │ │ ┌──────────▼───────────┐ │ │
│ └────────────────┘ │ │ S3 VPC Endpoint │ │ │
│ │ │ (object storage) │ │ │
│ │ └──────────────────────┘ │ │
│ └──────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Security Group: proxy-sg │ │
│ │ Inbound: 5439/tcp from agent_sg only │ │
│ │ Inbound: 9090/tcp from monitoring_sg only │ │
│ │ Outbound: 443/tcp to S3 VPC endpoint only │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

│ Control plane (metadata only, no data)
│ HTTPS to api.hatidata.com

┌─────────────────────────────┐
│ HatiData Control Plane │
│ (HatiData-managed cloud) │
│ Auth, billing, policies │
│ No access to customer data │
└─────────────────────────────┘

Key properties:

  • No public IP addresses on any proxy instance, NLB target, or S3 VPC endpoint
  • Agent workloads connect over PrivateLink — traffic never traverses the internet
  • The control plane receives only anonymized billing metrics, never query content
  • S3 access is restricted to the VPC endpoint (no internet-routed S3 access)

HatiData provisions an NLB-backed VPC Endpoint Service in your VPC. Your agent workloads connect via a VPC Interface Endpoint:

# Automatically provisioned by the HatiData Terraform module:

resource "aws_vpc_endpoint_service" "hatidata" {
acceptance_required = false
network_load_balancer_arns = [aws_lb.hatidata_proxy.arn]
allowed_principals = [var.agent_account_arn]
}

resource "aws_vpc_endpoint" "hatidata_consumer" {
# Created in the agent workload account / VPC
service_name = aws_vpc_endpoint_service.hatidata.service_name
vpc_endpoint_type = "Interface"
vpc_id = var.agent_vpc_id
subnet_ids = var.agent_private_subnet_ids
security_group_ids = [aws_security_group.hatidata_endpoint_sg.id]
}

Connection string for agents using the PrivateLink endpoint:

postgresql://hd_live_...@vpce-0abc123.hatidata.vpce-svc-xyz.eu-west-1.vpce.amazonaws.com:5439/hatidata?sslmode=require

GCP Private Service Connect

For GCP deployments, HatiData uses Private Service Connect (PSC) instead of PrivateLink. The proxy is deployed on Cloud Run or GKE with a PSC endpoint exposed to the customer's VPC.

For Azure deployments, HatiData uses Azure Private Link Service backed by an Internal Load Balancer. The proxy is deployed on AKS with a private endpoint in the customer's VNet.


Shield: Binary Protection

Shield is HatiData's four-layer protection system for the proxy binary. It prevents unauthorized copying or execution of the Enterprise proxy outside a licensed environment.

Layer 1: Black-Box Binary

The Enterprise proxy binary is distributed as a pre-compiled, stripped native binary with no debug symbols. Source code is not distributed. The binary implements obfuscated license validation that cannot be bypassed by binary patching without triggering the integrity check.

Layer 2: Heartbeat License Validation

The proxy sends a heartbeat to the HatiData license server every 15 minutes. The heartbeat includes:

  • Organization ID (from provisioning)
  • Environment fingerprint (see Layer 4)
  • Binary hash (cryptographic hash of the proxy binary)

If the heartbeat fails validation (revoked license, wrong environment, tampered binary), the proxy enters a graceful degradation mode: it completes in-flight queries and then stops accepting new connections. A grace period of 4 hours is provided before full shutdown, allowing time to investigate connectivity issues.

Layer 3: Locked Instance

The proxy binary is locked to the provisioned instance at boot time using instance metadata:

Lock material = sha256(
org_id + # provisioned organization
instance_id + # AWS EC2 / GCP instance / Azure VM ID
account_id + # cloud account where instance runs
region + # must match provisioned region
launch_time # set at first boot; cannot be replayed
)

A proxy binary that boots outside the provisioned account or region will fail the lock check and refuse to start.

Layer 4: Environment Binding

The proxy validates that the runtime environment matches the Terraform-provisioned configuration:

  • VPC ID must match the provisioned value
  • Security group ID must match
  • KMS key ARN must match the CMEK configuration
  • Subnet IDs must be within the provisioned region

This prevents copying a running instance's LUKS-unlocked state to a different environment.


Shadow Mode

Shadow Mode allows you to validate HatiData against your existing data infrastructure before committing to full cutover. In Shadow Mode, all queries are routed to both your existing infrastructure and HatiData in parallel — results are compared, and a report is generated. No data is modified.

How Shadow Mode Works

Agent Query

├──▶ Existing Infrastructure ──▶ Result A ──┐
│ ├──▶ Compare ──▶ Report
└──▶ HatiData Proxy (shadow) ──▶ Result B ──┘

Step 1: Upload Shadow Query Set

# Upload a set of representative queries for shadow testing
curl -X POST https://api.hatidata.com/v1/organizations/org_abc/shadow-mode/upload \
-H "Authorization: Bearer <jwt>" \
-F "queries=@representative-queries.sql"

The query file should include a representative sample of your production query workload: analytical queries, agent memory lookups, compliance queries, and any queries touching sensitive columns.

Step 2: Run Shadow Replay

# Execute shadow replay against your existing infrastructure
curl -X POST https://api.hatidata.com/v1/organizations/org_abc/shadow-mode/replay \
-H "Authorization: Bearer <jwt>" \
-H "Content-Type: application/json" \
-d '{
"comparison_dsn": "postgresql://user:pass@your-existing-host:5432/db",
"query_set_id": "qs_a1b2c3",
"sample_rate": 1.0,
"timeout_seconds": 300
}'

Step 3: Review Comparison Report

curl https://api.hatidata.com/v1/organizations/org_abc/shadow-mode/report/rpt_xyz \
-H "Authorization: Bearer <jwt>"
{
"report_id": "rpt_xyz",
"query_count": 1500,
"exact_match": 1487,
"semantic_match": 11,
"mismatch": 2,
"error": 0,
"hatidata_p50_ms": 18,
"hatidata_p99_ms": 94,
"existing_p50_ms": 210,
"existing_p99_ms": 1840,
"mismatch_details": [
{
"query": "SELECT ...",
"hatidata_row_count": 1842,
"existing_row_count": 1843,
"delta": "1 row difference — row added between shadow runs"
}
]
}

Shadow Mode runs are completely read-only. No writes, schema changes, or audit events are generated on your existing infrastructure.


Terraform Deployment

HatiData provides Terraform modules for AWS, GCP, and Azure. All modules follow the same variable interface for cross-cloud consistency.

AWS Modules (8 modules)

ModulePurpose
terraform/aws/modules/vpcVPC, subnets, route tables, NAT gateway
terraform/aws/modules/proxyEC2 Auto Scaling Group, NLB, launch template
terraform/aws/modules/storageS3 bucket with Object Lock, lifecycle rules, encryption
terraform/aws/modules/kmsKMS key with rotation policy and proxy IAM role
terraform/aws/modules/iamIAM roles and instance profiles
terraform/aws/modules/nlbNetwork Load Balancer and target group configuration
terraform/aws/modules/privatelinkVPC Endpoint Service and consumer endpoint
terraform/aws/modules/monitoringCloudWatch dashboards, alarms, Prometheus scrape config

GCP Modules (11 modules)

ModulePurpose
terraform/gcp/modules/vpcVPC network and subnets
terraform/gcp/modules/gkeGKE Standard cluster (high-performance SSD node pool)
terraform/gcp/modules/cloud_runCloud Run service for control plane
terraform/gcp/modules/cloudsqlCloud SQL for control plane metadata
terraform/gcp/modules/secretsSecret Manager for configuration
terraform/gcp/modules/registryArtifact Registry for proxy images
terraform/gcp/modules/kmsCloud KMS key ring and crypto key
terraform/gcp/modules/storageGCS bucket with retention policy
terraform/gcp/modules/iamService accounts and IAM bindings
terraform/gcp/modules/pscPrivate Service Connect endpoint
terraform/gcp/modules/monitoringCloud Monitoring dashboards and alerting

Azure Modules (6 modules)

ModulePurpose
terraform/azure/modules/vnetVirtual Network, subnets, NSGs
terraform/azure/modules/aksAKS cluster for proxy workload
terraform/azure/modules/storageAzure Blob Storage with immutability policy
terraform/azure/modules/keyvaultAzure Key Vault for CMEK
terraform/azure/modules/privatelinkPrivate Link Service and endpoint
terraform/azure/modules/monitoringAzure Monitor and Log Analytics

Deployment Example (AWS)

cd terraform/aws

# Initialize
terraform init -backend-config="bucket=my-terraform-state" \
-backend-config="key=hatidata/production" \
-backend-config="region=eu-west-1"

# Plan with production variables
terraform plan -var-file="environments/production.tfvars"

# Apply
terraform apply -var-file="environments/production.tfvars"

Compute Tiers

TierInstance TypevCPURAMSSD CacheMax Concurrent Queries
Starterr6id.xlarge432 GB237 GB25
Standardr6id.2xlarge864 GB474 GB50
Performancer6id.4xlarge16128 GB950 GB100
High Memoryr6id.8xlarge32256 GB1.9 TB200
CustomOn request

Instance type selection is configurable in terraform/aws/environments/*.tfvars. Spot instances are available for dev and staging environments at approximately 70% cost reduction (devbox_spot = true).


SLA

TierAvailability SLAP1 ResponseP1 Resolution
Cloud99.5%30 min8 hr
Growth99.9%15 min4 hr
Enterprise99.99%15 min4 hr

Enterprise SLA includes a dedicated support Slack channel, quarterly architecture reviews, and a named customer success engineer.


Migrating from Cloud to Enterprise

Cloud tier customers use HatiData's managed multi-tenant proxy. Enterprise tier customers run the proxy in their own VPC. The migration path:

  1. Provision Enterprise environment: Run Terraform in your AWS/GCP/Azure account. This provisions VPC, NLB, proxy ASG, S3 bucket, and KMS key.
  2. Migrate data: Copy your Parquet files from the managed S3 bucket to your new Enterprise S3 bucket using standard aws s3 sync or equivalent.
  3. Update connection strings: Point agents and pipelines to the new PrivateLink endpoint (vpce-...).
  4. Run Shadow Mode: Execute a shadow comparison with 100% sample rate to verify result parity.
  5. Cut over: Update DNS or connection configuration in your agent framework. The Cloud and Enterprise environments can run in parallel during the transition.
  6. Decommission Cloud environment: Cancel the Cloud tier subscription after cut-over validation.

Data migration assistance is available from the HatiData solutions engineering team for Growth and Enterprise customers.


Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.