Migrate from BigQuery

HatiData speaks the Postgres wire protocol, which means your existing SQL clients, ORMs, and BI tools connect without modification. This guide covers type mapping, connection migration, the dbt switchover, and what agent-native features you unlock.

Migration guide

This page helps teams moving workloads from BigQuery to HatiData. Most queries work with minimal changes — STRUCT types map to JSON with full path navigation support.

Type Mapping: STRUCT vs JSON

BigQuery uses STRUCT for nested records. HatiData represents nested data as JSON with full path navigation support.

BigQuery Type	HatiData Type	Access Pattern
`STRUCT<a STRING, b INT64>`	`JSON`	`col->>'a'`, `(col->>'b')::INT`
`ARRAY<STRING>`	`VARCHAR[]`	Standard SQL array syntax
`ARRAY<STRUCT<...>>`	`JSON[]`	`UNNEST(col)`
`RECORD`	`JSON`	JSON path operators
`BIGNUMERIC`	`NUMERIC(38, 9)`	Exact precision
`BYTES`	`BLOB`	Byte-for-byte
`DATETIME`	`TIMESTAMP`	UTC assumed
`TIMESTAMP`	`TIMESTAMPTZ`	Timezone preserved
`GEOGRAPHY`	`VARCHAR` (WKT)	PostGIS extensions planned

Example: STRUCT Query Rewrite

-- BigQuery: STRUCT access
SELECT
  user.name,
  user.address.city
FROM events
WHERE user.address.country = 'SG'

-- HatiData: JSON path operators (auto-suggested by hati shadow report)
SELECT
  user_data->>'name',
  user_data->'address'->>'city'
FROM events
WHERE user_data->'address'->>'country' = 'SG'

BigQuery ML vs HatiData Agent Features

BigQuery ML provides in-database ML inference via SQL. HatiData provides a broader agent-native runtime.

Capability	BigQuery ML	HatiData
Vector similarity search	Approximate only	Hybrid: vector ANN + SQL
Long-term agent memory	Not supported	SQL + vector, per-agent
Chain-of-thought ledger	Not supported	Cryptographically hash-chained, immutable
Semantic triggers	Not supported	Cosine + cooldown debounce
Branch isolation	Not supported	Per-agent schema branches
MCP tool support	Not supported	24 native MCP tools
Embedding storage	External	Native, with access tracking

Connection Migration

Replace the BigQuery Client

# Before: BigQuery client
from google.cloud import bigquery
client = bigquery.Client(project="my-gcp-project")
query_job = client.query("SELECT * FROM dataset.table")

# After: any Postgres client — psycopg2, asyncpg, SQLAlchemy, etc.
import psycopg2
conn = psycopg2.connect(
    "postgresql://myuser:mypass@localhost:5439/mydb"
)
cur = conn.cursor()
cur.execute("SELECT * FROM mytable")

SQLAlchemy (same engine string for any ORM)

# Before
from sqlalchemy import create_engine
engine = create_engine("bigquery://my-gcp-project/my_dataset")

# After
engine = create_engine(
    "postgresql+psycopg2://myuser:mypass@localhost:5439/mydb"
)

dbt Path

Switch your dbt profile from dbt-bigquery to dbt-postgres — no profile logic changes required.

# Before: profiles.yml (BigQuery)
my_project:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: my-gcp-project
      dataset: my_dataset

# After: profiles.yml (HatiData via dbt-postgres)
my_project:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      port: 5439
      user: myuser
      password: "{{ env_var('HATIDATA_PASSWORD') }}"
      dbname: mydb
      schema: my_schema

Models, tests, and seeds carry over unchanged. Jinja macros that reference BigQuery-specific functions (e.g., DATE_TRUNC with BigQuery syntax) may need minor adjustments — run dbt parse to surface errors.

Data Export: BigQuery to HatiData

# Step 1: Export BigQuery tables to GCS as Parquet
bq extract \
  --destination_format PARQUET \
  --compression SNAPPY \
  my_dataset.orders \
  gs://my-bucket/exports/orders-*.parquet

# Step 2: Download from GCS
gsutil -m cp -r gs://my-bucket/exports/ ./exports/

# Step 3: Push to HatiData
hati push --source ./exports/ --schema my_schema --format parquet

For continuous sync during migration, use the hati sync command to keep HatiData in sync with BigQuery until cutover.

What You Gain

Capability	Legacy Cloud Warehouse	HatiData
Wire protocol	Proprietary HTTP/gRPC	Postgres (any client)
Query latency	1–10s typical	Sub-10ms (in-VPC query engine)
Agent memory	Not supported	SQL + vector hybrid
CoT ledger	Not supported	Immutable, hash-chained
Semantic triggers	Not supported	Built-in
Deployment	Google Cloud only	Any cloud or on-prem VPC
Per-agent billing	Not supported	Native

Two-Plane Model — Why any Postgres client works
Persistent Memory — Long-term memory for agents
Chain-of-Thought Ledger — Immutable reasoning traces
Quickstart — Get started with data migration

Type Mapping: STRUCT vs JSON​

Example: STRUCT Query Rewrite​

BigQuery ML vs HatiData Agent Features​

Connection Migration​

Replace the BigQuery Client​

SQLAlchemy (same engine string for any ORM)​

dbt Path​

Data Export: BigQuery to HatiData​

What You Gain​

Related Concepts​

Stay in the loop