SQL Compatibility Guide
HatiData exposes a PostgreSQL v3 wire protocol on port 5439, backed by DuckDB as the execution engine. This guide documents exactly what is supported, what is not, and what behaves differently from a standard PostgreSQL server.
Supported Data Types
HatiData supports the following data types through its PostgreSQL wire protocol. Types are mapped from Postgres conventions to DuckDB's internal representation.
| Postgres Type | DuckDB Equivalent | Notes |
|---|---|---|
INTEGER / INT / INT4 | INTEGER | 32-bit signed integer |
BIGINT / INT8 | BIGINT | 64-bit signed integer |
SMALLINT / INT2 | SMALLINT | 16-bit signed integer |
BOOLEAN | BOOLEAN | true / false |
REAL / FLOAT4 | FLOAT | 32-bit IEEE 754 float |
DOUBLE PRECISION / FLOAT8 | DOUBLE | 64-bit IEEE 754 float |
NUMERIC / DECIMAL | NUMERIC / DECIMAL | Arbitrary precision. DECIMAL(p,s) preserves precision/scale. |
TEXT | VARCHAR | Unlimited-length string |
VARCHAR(n) | VARCHAR(n) | Length constraint preserved |
CHAR(n) | VARCHAR | Fixed-width char (stored as VARCHAR internally) |
UUID | UUID | 128-bit universally unique identifier |
DATE | DATE | Calendar date (YYYY-MM-DD) |
TIMESTAMP | TIMESTAMP | Microsecond precision, no timezone |
TIMESTAMPTZ | TIMESTAMPTZ | Microsecond precision, stored as UTC |
INTERVAL | INTERVAL | Duration type |
JSON | JSON | JSON text storage |
JSONB | JSON | Stored identically to JSON (no separate binary format) |
BYTEA | BLOB | Binary data |
INTEGER[] | LIST(INTEGER) | Array of integers (Postgres array text format) |
TEXT[] | LIST(VARCHAR) | Array of strings |
FLOAT4[] | LIST(FLOAT) | Array of 32-bit floats |
FLOAT8[] | LIST(DOUBLE) | Array of 64-bit floats |
BIGINT[] | LIST(BIGINT) | Array of 64-bit integers |
Unsupported Types
These Postgres types are not supported. Use the recommended alternatives:
| Postgres Type | Alternative | Reason |
|---|---|---|
SERIAL / BIGSERIAL | INTEGER + application-generated UUIDs | DuckDB has no auto-increment sequences. The proxy returns a clear error if SERIAL is used in CREATE TABLE. |
MONEY | NUMERIC(12,2) | Not a standard SQL type; use explicit decimal precision. |
CIDR / INET | TEXT | Network address types are Postgres-specific. Store as text and parse in application code. |
MACADDR | TEXT | Hardware address type is Postgres-specific. |
XML | TEXT or JSON | XML type is not supported by DuckDB. |
HSTORE | JSON | Key-value type is Postgres-specific. JSON provides the same capability. |
TSVECTOR / TSQUERY | TEXT + semantic_match() | Full-text search types are Postgres-specific. Use HatiData's vector search functions instead. |
| Custom domains / composites | Not supported | DuckDB does not support user-defined composite types. |
Query Protocol Support
HatiData implements the full PostgreSQL v3 wire protocol, supporting both query modes.
Simple Query Protocol
Status: Fully supported
Used by psql, simple scripts, and most BI tools. Sends SQL as text and receives text-formatted results.
Extended Query Protocol (Prepared Statements)
Status: Supported with limitations
Used by application drivers (node-postgres, psycopg2, SQLAlchemy, tokio-postgres, sqlx, asyncpg). Supports the full Parse/Bind/Describe/Execute message flow.
Binary Parameter Binding
Binary-format parameters in the Bind message are supported for:
| Type | OID | Binary Support |
|---|---|---|
BOOL | 16 | Supported |
INT2 | 21 | Supported |
INT4 | 23 | Supported |
INT8 | 20 | Supported |
FLOAT4 | 700 | Supported |
FLOAT8 | 701 | Supported |
TEXT | 25 | Supported |
VARCHAR | 1043 | Supported |
BYTEA | 17 | Supported |
JSON | 114 | Supported |
JSONB | 3802 | Supported |
DATE | 1082 | Supported |
TIMESTAMP | 1114 | Supported |
TIMESTAMPTZ | 1184 | Supported |
UUID | 2950 | Supported |
Binary Result Columns
The same 14 types listed above support binary encoding in result columns. When a client requests binary format via the Bind message's result format codes, HatiData encodes values in Postgres-compatible binary format.
Types that fall back to text format:
| Type | OID | Reason |
|---|---|---|
NUMERIC | 1700 | Postgres NUMERIC binary format is BCD-encoded with a complex header structure. Text fallback avoids encoding errors. |
INTERVAL | 1186 | Binary interval encoding not yet implemented. |
| Array types | 1007, 1009, etc. | Binary array encoding not yet implemented. |
The fallback is spec-compliant: the Describe response advertises format=0 (text) for unsupported types, so clients always know what to expect. All major drivers (tokio-postgres, sqlx, asyncpg, node-postgres) handle this correctly.
If you encounter issues with the extended query protocol, most drivers allow forcing simple query mode. For example, in psycopg2, use cursor.execute() with %s parameters instead of server-side prepared statements.
Supported SQL Features
| Feature | Status |
|---|---|
SELECT (with subqueries, aliases, expressions) | Fully supported |
INSERT INTO ... VALUES | Fully supported |
INSERT INTO ... SELECT | Fully supported |
UPDATE | Fully supported |
DELETE | Fully supported |
CREATE TABLE (with type mapping) | Fully supported |
CREATE TABLE AS SELECT | Fully supported |
ALTER TABLE ADD COLUMN (with type mapping) | Fully supported |
DROP TABLE | Fully supported |
CREATE VIEW | Fully supported |
JOIN (INNER, LEFT, RIGHT, FULL, CROSS) | Fully supported |
LATERAL joins | Fully supported |
| Subqueries (scalar, correlated, IN/EXISTS) | Fully supported |
Window functions (ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE, SUM OVER, etc.) | Fully supported |
QUALIFY clause | Fully supported (native DuckDB) |
Common Table Expressions (WITH) | Fully supported |
UNION / INTERSECT / EXCEPT | Fully supported |
GROUP BY, HAVING | Fully supported |
ORDER BY, LIMIT, OFFSET | Fully supported |
CASE WHEN ... THEN ... ELSE ... END | Fully supported |
CAST(expr AS type) / ::type | Fully supported |
LIKE, ILIKE | Fully supported |
BETWEEN, IN, EXISTS | Fully supported |
EXPLAIN | Fully supported |
TABLESAMPLE / SAMPLE | Supported (syntax may differ slightly) |
Unsupported SQL Features
| Feature | Error Message | Alternative |
|---|---|---|
SERIAL / BIGSERIAL / CREATE SEQUENCE | "SERIAL/SEQUENCE types are not supported. Use INTEGER with application-generated IDs (UUID recommended)." | Use INTEGER or BIGINT columns with application-generated UUIDs or IDs. |
COPY IN / COPY OUT | "COPY is not supported via the wire protocol. Use INSERT...SELECT for bulk data loading, or the HatiData Python/TypeScript SDK for file imports." | Use INSERT...SELECT, the SDK, or DuckDB's read_csv() / read_parquet() functions. |
DECLARE CURSOR / FETCH / CLOSE | "Server-side cursors (DECLARE/FETCH/CLOSE) are not supported. Use LIMIT/OFFSET for pagination." | Use LIMIT / OFFSET for pagination. |
LISTEN / NOTIFY | "LISTEN/NOTIFY is not supported. Use the HatiData Semantic Triggers feature for event-driven workflows." | Use Semantic Triggers for event-driven agent workflows. |
BEGIN / COMMIT / ROLLBACK | Accepted (no error), but a NOTICE is sent on BEGIN: "HatiData operates in auto-commit mode. BEGIN/COMMIT/ROLLBACK are accepted for compatibility but provide no transaction isolation." | Each statement auto-commits. Design for idempotent operations. |
SAVEPOINT | Not supported | Not available in auto-commit mode. |
Stored procedures (CREATE FUNCTION) | Not supported | Implement logic in application code or use the MCP server's tool pipeline. |
CREATE TRIGGER | Not supported | Use Semantic Triggers for event-driven workflows. |
FOREIGN KEY constraints | Accepted in DDL (parsed) | Foreign keys are not enforced at query time. They are syntactically accepted for migration compatibility. |
MATERIALIZED VIEW | Not supported | Use regular CREATE VIEW with HatiData's query cache for performance. |
ON CONFLICT DO UPDATE (UPSERT) | Supported (DuckDB native) | DuckDB supports INSERT ... ON CONFLICT DO UPDATE. Works as expected. |
Snowflake SQL Transpilation
HatiData automatically transpiles Snowflake-dialect SQL to DuckDB-compatible SQL. This happens transparently in the query pipeline. Transpilation results are cached by SQL hash, so repeated queries hit sub-millisecond cache lookups.
Function Mappings
| Snowflake Function | HatiData/DuckDB Equivalent | Notes |
|---|---|---|
IFF(cond, true_val, false_val) | IF(cond, true_val, false_val) | Direct rename |
NVL(expr, default) | COALESCE(expr, default) | Direct rename |
NVL2(expr, not_null, null_val) | CASE WHEN expr IS NOT NULL THEN not_null ELSE null_val END | Expanded to CASE expression |
DATEDIFF(part, start, end) | DATE_DIFF(part, start, end) | Underscore added |
DATEADD(part, n, date) | date + INTERVAL 'n' part | Rewritten to interval arithmetic |
TO_TIMESTAMP_NTZ(expr) | CAST(expr AS TIMESTAMP) | Rewritten to CAST |
PARSE_JSON(expr) | CAST(expr AS JSON) | Rewritten to CAST |
TO_VARIANT(expr) | CAST(expr AS JSON) | Rewritten to CAST |
OBJECT_CONSTRUCT(k, v, ...) | json_object(k, v, ...) | Direct rename |
GET_PATH(expr, 'path') | json_extract(expr, 'path') | Direct rename |
FLATTEN(input => expr) | UNNEST(expr) | Extracts input parameter |
LISTAGG(expr, delim) | STRING_AGG(expr, delim) | Handles both Expr::Function and Expr::ListAgg AST forms |
ARRAY_AGG(expr) | LIST(expr) | Handles both Expr::Function and Expr::ArrayAgg AST forms |
SPLIT_PART(str, delim, n) | SPLIT_PART(str, delim, n) | Passthrough (native DuckDB) |
REGEXP_REPLACE(str, pat, rep) | REGEXP_REPLACE(str, pat, rep) | Passthrough (native DuckDB) |
COALESCE(...) | COALESCE(...) | Passthrough |
DECODE(expr, s1, r1, ...) | DECODE(...) | Passthrough (native DuckDB) |
DATE_TRUNC('part', expr) | DATE_TRUNC('part', expr) | Passthrough |
TRY_CAST(expr AS type) | TRY_CAST(expr AS type) | Passthrough |
Type Mappings
Snowflake-specific types in CREATE TABLE, ALTER TABLE, and CAST expressions are automatically rewritten:
| Snowflake Type | DuckDB Type | Notes |
|---|---|---|
VARIANT | JSON | Universal semi-structured type |
OBJECT | JSON | Key-value object type |
ARRAY (untyped) | JSON | Typed ARRAY(T) is preserved |
NUMBER / NUMBER(p,s) | NUMERIC | Precision/scale simplified |
TIMESTAMP_NTZ | TIMESTAMP | Timestamp without timezone |
TIMESTAMP_LTZ | TIMESTAMPTZ | Timestamp with local timezone |
Snowflake Syntax Transpilation
| Snowflake Syntax | HatiData Equivalent | Notes |
|---|---|---|
col:field (colon notation) | json_extract_string(col, 'field') | Pre-parse normalization. Multi-level access chains are recursively expanded. |
col:field::TYPE | CAST(json_extract_string(col, 'field') AS TYPE) | Colon notation with type cast |
TABLE(FLATTEN(input => expr)) | UNNEST(expr) | FLATTEN parameters path, outer, recursive, mode are not supported. |
CLONE | Not supported | Use CREATE TABLE ... AS SELECT * FROM ... for full data copy. |
For detailed examples and edge cases, see SQL Functions & Types.
Vector Search Functions (HatiData-Specific)
HatiData extends standard SQL with vector-aware functions for semantic search. These are resolved at query time: the proxy embeds the search text and evaluates cosine similarity against the vector index.
semantic_match()
Returns TRUE if cosine similarity exceeds a threshold. Use in WHERE and JOIN ON clauses.
-- Default threshold (0.7)
SELECT * FROM documents
WHERE semantic_match(content, 'billing dispute resolution');
-- Custom threshold
SELECT * FROM support_tickets
WHERE semantic_match(description, 'refund request', 0.85)
AND status = 'open';
Signature: semantic_match(column, 'search text' [, threshold FLOAT = 0.7])
vector_match() is an alias for semantic_match.
semantic_rank()
Returns the cosine similarity score as a FLOAT (0.0 to 1.0). Use in SELECT and ORDER BY.
SELECT content,
semantic_rank(content, 'revenue growth Q4') AS relevance
FROM _hatidata_agent_memory
WHERE agent_id = 'analyst'
ORDER BY relevance DESC
LIMIT 10;
JOIN_VECTOR
Syntactic sugar combining a JOIN with semantic_match for readable intent:
SELECT t.ticket_id, m.content AS related_memory
FROM support_tickets t
JOIN_VECTOR _hatidata_agent_memory m
ON semantic_match(m.content, t.subject, 0.7)
ORDER BY semantic_rank(m.content, t.subject) DESC;
Vector functions require an embedding provider configured on the proxy. Without one, queries using these functions return an error.
Text Search
| Feature | Status | Notes |
|---|---|---|
LIKE / ILIKE | Supported | Standard pattern matching with % and _ wildcards. |
contains(col, 'text') | Supported | DuckDB built-in substring search. |
regexp_matches(col, 'pattern') | Supported | DuckDB built-in regex matching. No lookbehind support (Rust regex crate limitation). |
similarity() | Not available | Use semantic_rank() for vector-based similarity scoring. |
pg_trgm extension | Not available | PostgreSQL-specific extension. Use ILIKE or semantic_match() instead. |
System Catalog Support
HatiData presents itself as PostgreSQL 15.0 to connected clients and implements a compatibility layer for system catalog queries.
| Catalog | Status | Details |
|---|---|---|
information_schema.tables | Real data | Passes through to DuckDB. Returns actual table metadata. |
information_schema.columns | Real data | Passes through to DuckDB. Returns actual column metadata. |
information_schema.schemata | Real data | Passes through to DuckDB. |
pg_class | Real data | Passes through to DuckDB's pg_catalog. Supports \dt and BI tool table discovery. |
pg_attribute | Real data | Passes through to DuckDB's pg_catalog. Supports \d <table> column inspection. |
pg_tables | Real data | Passes through to DuckDB's pg_catalog. |
pg_type | Mocked (36 types) | Returns 36 common Postgres type definitions. Covers all types that BI tools (Tableau, Metabase, DBeaver, dbt) query. |
pg_namespace | Mocked | Returns pg_catalog, public, and information_schema schemas. |
pg_database | Mocked | Returns a single hatidata database entry. |
pg_roles / pg_user | Mocked | Returns a single hatidata superuser. |
pg_settings | Mocked | Returns server_version (15.0) and max_connections (100). |
pg_proc | Limited | DuckDB's built-in functions, not Postgres function catalog. |
pg_description | Empty | Returns empty result set. |
pg_constraint | Empty | Returns empty result set (foreign keys are not enforced). |
pg_index | Empty | Returns empty result set. |
pg_am | Empty | Returns empty result set. |
pg_stat_* | Not available | Statistics views are not implemented. Returns empty result set. |
Supported Session Commands
| Command | Behavior |
|---|---|
SET <parameter> = <value> | Acknowledged (returns SET). All SET commands are accepted but most are no-ops. |
SHOW server_version | Returns 15.0 |
SHOW server_encoding | Returns UTF8 |
SHOW client_encoding | Returns UTF8 |
SHOW timezone | Returns UTC |
SHOW search_path | Returns main |
SHOW datestyle | Returns ISO, MDY |
SHOW standard_conforming_strings | Returns on |
SHOW transaction_isolation | Returns read committed |
SHOW max_identifier_length | Returns 63 |
SHOW ALL | Returns a table of key server settings |
SELECT version() | Returns PostgreSQL 15.0 (HatiData Proxy) |
SELECT current_database() | Returns hatidata |
SELECT current_schema() | Returns public |
SELECT current_user | Returns hatidata |
DEALLOCATE / DEALLOCATE ALL | Acknowledged (no-op, returns DEALLOCATE) |
Known Differences from PostgreSQL
-
Auto-commit mode --- Each SQL statement is its own transaction.
BEGIN,COMMIT, andROLLBACKare accepted for compatibility but provide no transaction isolation. ANOTICEis sent whenBEGINis issued. -
TIMESTAMP is always UTC internally --- DuckDB stores timestamps in UTC.
TIMESTAMPTZvalues are converted to UTC on ingestion. Timezone-aware queries should use explicitAT TIME ZONEconversions. -
Foreign keys are parsed but not enforced ---
CREATE TABLEwithFOREIGN KEYconstraints is accepted syntactically (for migration compatibility), but referential integrity is not checked at insert/update time. -
NUMERIC uses text encoding --- Binary wire format for
NUMERICis not implemented (Postgres NUMERIC binary is BCD-encoded with a complex header). NUMERIC columns always use text format in result rows. This is transparent to most clients. -
JSON and JSONB are equivalent --- Both are stored as JSON text internally. There is no separate binary JSON storage format. JSONB columns are reported with OID 3802 for wire protocol compatibility, but storage and query behavior is identical.
-
ARRAY columns use Postgres text format --- Array values are formatted as Postgres array text literals (e.g.,
{1,2,3}) but may exhibit behavioral differences in edge cases (NULL handling, multi-dimensional arrays). -
String concatenation ---
||works for string concatenation.+does not concatenate strings (it is arithmetic only). -
No sequences or auto-increment ---
SERIAL,BIGSERIAL,CREATE SEQUENCE,nextval(), andcurrval()are not supported. Use application-generated UUIDs or IDs. -
Microsecond timestamp precision --- Snowflake's
TIMESTAMP_NTZ(9)nanosecond precision is not preserved. Timestamps have microsecond precision. -
DuckDB-specific functions available --- DuckDB built-in functions (
list_aggregate,list_sort,regexp_extract,strftime,read_csv,read_parquet, etc.) are available alongside standard SQL and transpiled Snowflake functions.
Connection String Reference
postgres://USERNAME:API_KEY@HOST:5439/DATABASE?sslmode=require
| Parameter | Description | Example |
|---|---|---|
USERNAME | Any string (not validated; API key is the authentication mechanism) | admin |
API_KEY | HatiData API key. Exactly 40 characters: prefix (hd_live_ or hd_test_) + 32 alphanumeric chars. Agent keys use hd_agent_ prefix. | hd_live_abc123def456ghi789jkl012mno345pq |
HOST | Your HatiData proxy endpoint | data.yourcompany.com |
PORT | Always 5439 | 5439 |
DATABASE | Database name. Use hatidata for the default catalog. | hatidata |
sslmode | require for production, disable for local development | require |
Examples:
# Production (cloud)
psql "postgres://admin:hd_live_your_api_key@data.yourcompany.com:5439/hatidata?sslmode=require"
# Local development
psql -h localhost -p 5439 -U admin -d hatidata
# Environment variables
export PGHOST=data.yourcompany.com
export PGPORT=5439
export PGDATABASE=hatidata
export PGUSER=admin
export PGPASSWORD=hd_live_your_api_key
export PGSSLMODE=require
psql
Known Unsupported Snowflake Features
The following Snowflake-specific features are not supported by HatiData's transpiler. Each has a recommended workaround that achieves the same result using standard SQL or HatiData-native features.
| Feature | Status | Workaround |
|---|---|---|
MERGE statement | Not supported | Use INSERT ... ON CONFLICT DO UPDATE for upserts, or separate INSERT and UPDATE statements. |
LATERAL FLATTEN with nested arrays | Partial | Single-level FLATTEN (transpiled to UNNEST) works. Nested/recursive FLATTEN with path, outer, or recursive parameters is not supported. Flatten one level at a time with chained UNNEST calls. |
CONNECT BY / PRIOR | Not supported | Use recursive CTEs (WITH RECURSIVE) which are fully supported by DuckDB. |
PIVOT / UNPIVOT | Not supported | Rewrite as CASE expressions with aggregation. For example: SUM(CASE WHEN category = 'A' THEN amount END) AS a_total. |
Stored procedures (CREATE PROCEDURE) | Not applicable | HatiData is a query engine, not a procedural runtime. Implement logic in your agent code or orchestration layer. |
| Streams and Tasks | Not supported | Use Semantic Triggers for event-driven, content-aware workflows that replace Snowflake's polling-based Streams/Tasks model. |
CREATE STAGE / PUT / GET | Not supported | Load data via INSERT ... SELECT, the Python/TypeScript SDK, or DuckDB's built-in read_csv() / read_parquet() functions. |
If you are migrating from Snowflake and encounter an unsupported feature not listed here, the proxy returns a descriptive error message identifying the unsupported syntax. See the Snowflake Migration Guide for a step-by-step migration walkthrough.
Transpilation Performance
HatiData's Snowflake-to-DuckDB transpiler operates entirely in-memory using a parsed AST (abstract syntax tree). Transpilation results are cached by SQL hash, so repeated queries skip transpilation entirely and resolve in sub-microsecond cache lookups.
Benchmarks by Query Complexity
| Complexity | Example | Avg Transpilation Time | Cache Lookup |
|---|---|---|---|
| Simple | SELECT col FROM table WHERE x = 1 with IFF, NVL rewrites | < 0.1 ms | < 0.01 ms |
| Medium | Multi-table JOINs with DATEDIFF, FLATTEN, colon notation, type mappings | < 0.5 ms | < 0.01 ms |
| Complex | Deeply nested subqueries, multiple window functions with QUALIFY, chained CTEs, LISTAGG + ARRAY_AGG | < 2 ms | < 0.01 ms |
What Makes a Query "Complex"
Transpilation time scales with AST depth and the number of rewrite rules that fire. The following patterns increase transpilation cost:
- Deeply nested subqueries — Each nesting level requires a full recursive AST traversal for function and type rewrites.
- Multiple JOINs combined with QUALIFY — The transpiler must rewrite QUALIFY into a window-function subquery wrapper when targeting older SQL dialects, though DuckDB supports QUALIFY natively (passthrough).
- Chained colon notation — Expressions like
col:a:b:c::INTrequire recursive pre-parse expansion before AST construction. - Mixed Snowflake functions — Queries using many distinct Snowflake functions (e.g.,
IFF,NVL2,DATEDIFF,FLATTEN,OBJECT_CONSTRUCTin the same query) trigger more rewrite passes. - Large IN lists or UNION ALL chains — These increase AST node count but do not require complex rewrites. Transpilation is still fast, but parsing dominates.
In practice, transpilation is never the bottleneck. DuckDB query execution and network round-trips dominate end-to-end latency by 10-100x.
Related Pages
- SQL Functions & Types --- Detailed function mapping reference with examples
- Postgres Drivers & BI Tools --- Connection guides for psql, Python, Node.js, Java, DBeaver, Metabase, Grafana, Tableau, Power BI
- Binary Wire Format --- Deep dive into the binary encoding implementation
- Semantic Triggers --- Event-driven alternative to LISTEN/NOTIFY
- Python SDK --- Agent-aware queries with automatic audit attribution