Migrate from BigQuery
HatiData speaks the Postgres wire protocol, which means your existing SQL clients, ORMs, and BI tools connect without modification. This guide covers type mapping, connection migration, the dbt switchover, and what agent-native features you unlock.
This page helps teams moving workloads from BigQuery to HatiData. Most queries work with minimal changes — STRUCT types map to JSON with full path navigation support.
Type Mapping: STRUCT vs JSON
BigQuery uses STRUCT for nested records. HatiData represents nested data as JSON with full path navigation support.
| BigQuery Type | HatiData Type | Access Pattern |
|---|---|---|
STRUCT<a STRING, b INT64> | JSON | col->>'a', (col->>'b')::INT |
ARRAY<STRING> | VARCHAR[] | Standard SQL array syntax |
ARRAY<STRUCT<...>> | JSON[] | UNNEST(col) |
RECORD | JSON | JSON path operators |
BIGNUMERIC | NUMERIC(38, 9) | Exact precision |
BYTES | BLOB | Byte-for-byte |
DATETIME | TIMESTAMP | UTC assumed |
TIMESTAMP | TIMESTAMPTZ | Timezone preserved |
GEOGRAPHY | VARCHAR (WKT) | PostGIS extensions planned |
Example: STRUCT Query Rewrite
-- BigQuery: STRUCT access
SELECT
user.name,
user.address.city
FROM events
WHERE user.address.country = 'SG'
-- HatiData: JSON path operators (auto-suggested by hati shadow report)
SELECT
user_data->>'name',
user_data->'address'->>'city'
FROM events
WHERE user_data->'address'->>'country' = 'SG'
BigQuery ML vs HatiData Agent Features
BigQuery ML provides in-database ML inference via SQL. HatiData provides a broader agent-native runtime.
| Capability | BigQuery ML | HatiData |
|---|---|---|
| Vector similarity search | Approximate only | Hybrid: vector ANN + SQL |
| Long-term agent memory | Not supported | SQL + vector, per-agent |
| Chain-of-thought ledger | Not supported | Cryptographically hash-chained, immutable |
| Semantic triggers | Not supported | Cosine + cooldown debounce |
| Branch isolation | Not supported | Per-agent schema branches |
| MCP tool support | Not supported | 24 native MCP tools |
| Embedding storage | External | Native, with access tracking |
Connection Migration
Replace the BigQuery Client
# Before: BigQuery client
from google.cloud import bigquery
client = bigquery.Client(project="my-gcp-project")
query_job = client.query("SELECT * FROM dataset.table")
# After: any Postgres client — psycopg2, asyncpg, SQLAlchemy, etc.
import psycopg2
conn = psycopg2.connect(
"postgresql://myuser:mypass@localhost:5439/mydb"
)
cur = conn.cursor()
cur.execute("SELECT * FROM mytable")
SQLAlchemy (same engine string for any ORM)
# Before
from sqlalchemy import create_engine
engine = create_engine("bigquery://my-gcp-project/my_dataset")
# After
engine = create_engine(
"postgresql+psycopg2://myuser:mypass@localhost:5439/mydb"
)
dbt Path
Switch your dbt profile from dbt-bigquery to dbt-postgres — no profile logic changes required.
# Before: profiles.yml (BigQuery)
my_project:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: my-gcp-project
dataset: my_dataset
# After: profiles.yml (HatiData via dbt-postgres)
my_project:
target: dev
outputs:
dev:
type: postgres
host: localhost
port: 5439
user: myuser
password: "{{ env_var('HATIDATA_PASSWORD') }}"
dbname: mydb
schema: my_schema
Models, tests, and seeds carry over unchanged. Jinja macros that reference BigQuery-specific functions (e.g., DATE_TRUNC with BigQuery syntax) may need minor adjustments — run dbt parse to surface errors.
Data Export: BigQuery to HatiData
# Step 1: Export BigQuery tables to GCS as Parquet
bq extract \
--destination_format PARQUET \
--compression SNAPPY \
my_dataset.orders \
gs://my-bucket/exports/orders-*.parquet
# Step 2: Download from GCS
gsutil -m cp -r gs://my-bucket/exports/ ./exports/
# Step 3: Push to HatiData
hati push --source ./exports/ --schema my_schema --format parquet
For continuous sync during migration, use the hati sync command to keep HatiData in sync with BigQuery until cutover.
What You Gain
| Capability | Legacy Cloud Warehouse | HatiData |
|---|---|---|
| Wire protocol | Proprietary HTTP/gRPC | Postgres (any client) |
| Query latency | 1–10s typical | Sub-10ms (in-VPC query engine) |
| Agent memory | Not supported | SQL + vector hybrid |
| CoT ledger | Not supported | Immutable, hash-chained |
| Semantic triggers | Not supported | Built-in |
| Deployment | Google Cloud only | Any cloud or on-prem VPC |
| Per-agent billing | Not supported | Native |
Related Concepts
- Two-Plane Model — Why any Postgres client works
- Persistent Memory — Long-term memory for agents
- Chain-of-Thought Ledger — Immutable reasoning traces
- Quickstart — Get started with data migration