Skip to main content

Migrate from BigQuery

HatiData speaks the Postgres wire protocol, which means your existing SQL clients, ORMs, and BI tools connect without modification. This guide covers type mapping, connection migration, the dbt switchover, and what agent-native features you unlock.

Migration guide

This page helps teams moving workloads from BigQuery to HatiData. Most queries work with minimal changes — STRUCT types map to JSON with full path navigation support.

Type Mapping: STRUCT vs JSON

BigQuery uses STRUCT for nested records. HatiData represents nested data as JSON with full path navigation support.

BigQuery TypeHatiData TypeAccess Pattern
STRUCT<a STRING, b INT64>JSONcol->>'a', (col->>'b')::INT
ARRAY<STRING>VARCHAR[]Standard SQL array syntax
ARRAY<STRUCT<...>>JSON[]UNNEST(col)
RECORDJSONJSON path operators
BIGNUMERICNUMERIC(38, 9)Exact precision
BYTESBLOBByte-for-byte
DATETIMETIMESTAMPUTC assumed
TIMESTAMPTIMESTAMPTZTimezone preserved
GEOGRAPHYVARCHAR (WKT)PostGIS extensions planned

Example: STRUCT Query Rewrite

-- BigQuery: STRUCT access
SELECT
user.name,
user.address.city
FROM events
WHERE user.address.country = 'SG'

-- HatiData: JSON path operators (auto-suggested by hati shadow report)
SELECT
user_data->>'name',
user_data->'address'->>'city'
FROM events
WHERE user_data->'address'->>'country' = 'SG'

BigQuery ML vs HatiData Agent Features

BigQuery ML provides in-database ML inference via SQL. HatiData provides a broader agent-native runtime.

CapabilityBigQuery MLHatiData
Vector similarity searchApproximate onlyHybrid: vector ANN + SQL
Long-term agent memoryNot supportedSQL + vector, per-agent
Chain-of-thought ledgerNot supportedCryptographically hash-chained, immutable
Semantic triggersNot supportedCosine + cooldown debounce
Branch isolationNot supportedPer-agent schema branches
MCP tool supportNot supported24 native MCP tools
Embedding storageExternalNative, with access tracking

Connection Migration

Replace the BigQuery Client

# Before: BigQuery client
from google.cloud import bigquery
client = bigquery.Client(project="my-gcp-project")
query_job = client.query("SELECT * FROM dataset.table")

# After: any Postgres client — psycopg2, asyncpg, SQLAlchemy, etc.
import psycopg2
conn = psycopg2.connect(
"postgresql://myuser:mypass@localhost:5439/mydb"
)
cur = conn.cursor()
cur.execute("SELECT * FROM mytable")

SQLAlchemy (same engine string for any ORM)

# Before
from sqlalchemy import create_engine
engine = create_engine("bigquery://my-gcp-project/my_dataset")

# After
engine = create_engine(
"postgresql+psycopg2://myuser:mypass@localhost:5439/mydb"
)

dbt Path

Switch your dbt profile from dbt-bigquery to dbt-postgres — no profile logic changes required.

# Before: profiles.yml (BigQuery)
my_project:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: my-gcp-project
dataset: my_dataset

# After: profiles.yml (HatiData via dbt-postgres)
my_project:
target: dev
outputs:
dev:
type: postgres
host: localhost
port: 5439
user: myuser
password: "{{ env_var('HATIDATA_PASSWORD') }}"
dbname: mydb
schema: my_schema

Models, tests, and seeds carry over unchanged. Jinja macros that reference BigQuery-specific functions (e.g., DATE_TRUNC with BigQuery syntax) may need minor adjustments — run dbt parse to surface errors.

Data Export: BigQuery to HatiData

# Step 1: Export BigQuery tables to GCS as Parquet
bq extract \
--destination_format PARQUET \
--compression SNAPPY \
my_dataset.orders \
gs://my-bucket/exports/orders-*.parquet

# Step 2: Download from GCS
gsutil -m cp -r gs://my-bucket/exports/ ./exports/

# Step 3: Push to HatiData
hati push --source ./exports/ --schema my_schema --format parquet

For continuous sync during migration, use the hati sync command to keep HatiData in sync with BigQuery until cutover.

What You Gain

CapabilityLegacy Cloud WarehouseHatiData
Wire protocolProprietary HTTP/gRPCPostgres (any client)
Query latency1–10s typicalSub-10ms (in-VPC query engine)
Agent memoryNot supportedSQL + vector hybrid
CoT ledgerNot supportedImmutable, hash-chained
Semantic triggersNot supportedBuilt-in
DeploymentGoogle Cloud onlyAny cloud or on-prem VPC
Per-agent billingNot supportedNative

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.