SQL Compatibility
HatiData accepts Snowflake-compatible SQL and automatically transpiles it to DuckDB. The transpilation pipeline runs in five stages, producing DuckDB-compatible output with a confidence score indicating transpilation reliability.
Transpilation Pipeline
Every SQL statement passes through these stages in order:
Stage 1: Pre-parse (Colon Notation)
Snowflake colon notation (col:field) is rewritten to json_extract_string(col, 'field') before the SQL parser runs. This is necessary because sqlparser does not understand Snowflake's proprietary colon access syntax. The pre-parser uses regex matching and carefully avoids interfering with PostgreSQL-style :: cast operators.
-- Input
SELECT payload:user:name FROM events;
-- After pre-parse
SELECT json_extract_string(json_extract_string(payload, 'user'), 'name') FROM events;
Stage 1.5: Vector Join Pre-parse
If the SQL contains hybrid SQL keywords (JOIN_VECTOR, semantic_match, vector_match, semantic_rank), a second pre-parse stage rewrites them before parsing. This is necessary because sqlparser does not understand HatiData's vector join syntax.
-- Input
SELECT t.id, k.title
FROM tickets t
JOIN_VECTOR knowledge_base k ON semantic_match(k.embedding, t.subject)
WHERE semantic_rank(k.embedding, t.subject) > 0.5;
-- After vector pre-parse and embedding resolution
SELECT t.id, k.title
FROM tickets t
INNER JOIN knowledge_base k ON list_cosine_similarity(k.embedding, [0.12, -0.34, ...]) > 0.7
WHERE list_cosine_similarity(k.embedding, [0.12, -0.34, ...]) > 0.5;
The proxy sends the search text to the configured embedding provider and replaces semantic_match / semantic_rank calls with list_cosine_similarity() using the resolved embedding vectors. See Hybrid Search for the full reference.
Stage 2: Parse
The pre-processed SQL is parsed using sqlparser's Snowflake dialect, producing an abstract syntax tree (AST). Parse errors are returned immediately as TranspilerError::ParseError.
Stage 3: Function Rewrites
Snowflake-specific functions are mapped to their DuckDB equivalents by walking the AST. See Function Reference for the complete mapping table.
Stage 4: Type Rewrites
Snowflake data types in CREATE TABLE statements, ALTER TABLE ADD COLUMN, and CAST expressions are mapped to DuckDB types. See Data Types for the complete mapping table.
Stage 5: Construct Rewrites
Snowflake-specific constructs such as FLATTEN, OBJECT_CONSTRUCT, GET_PATH, and CLONE are transformed to DuckDB equivalents. Unsupported constructs like COPY INTO and staged file references are rejected with descriptive error messages. See SQL Constructs for details.
After all five stages, the rewritten AST is rendered back to a SQL string and cached.
Confidence Score
Each transpilation result includes a confidence score between 0.0 and 1.0:
- 1.0 -- All rewrites applied cleanly, no warnings
- 0.9 -- A function or type rewrite produced a warning but did not fail
- < 0.9 -- Multiple rewrite warnings; review the output carefully
The confidence score decreases by 0.1 for each function or type rewrite warning. Snowflake construct errors (e.g., COPY INTO) are hard failures and do not produce a reduced confidence score -- they return an error instead.
Function Mapping
| Snowflake Function | DuckDB Equivalent | Notes |
|---|---|---|
IFF(cond, true, false) | IF(cond, true, false) | Direct rename |
NVL(expr, default) | COALESCE(expr, default) | Direct rename |
NVL2(expr, not_null, null) | CASE WHEN expr IS NOT NULL THEN not_null ELSE null END | Expanded to CASE |
DATEDIFF(part, start, end) | DATE_DIFF(part, start, end) | Direct rename |
DATEADD(part, amount, date) | date + INTERVAL 'amount' part | Rewritten to interval arithmetic |
LISTAGG(expr, delimiter) | STRING_AGG(expr, delimiter) | Direct rename |
ARRAY_AGG(expr) | LIST(expr) | Direct rename |
PARSE_JSON(expr) | CAST(expr AS JSON) | Rewritten to CAST |
TO_TIMESTAMP_NTZ(expr) | CAST(expr AS TIMESTAMP) | Rewritten to CAST |
TO_VARIANT(expr) | CAST(expr AS JSON) | Rewritten to CAST |
OBJECT_CONSTRUCT(k, v, ...) | json_object(k, v, ...) | Renamed |
GET_PATH(expr, path) | json_extract(expr, path) | Renamed |
FLATTEN(input => expr) | UNNEST(expr) | Extracts input parameter |
SPLIT_PART(str, delim, idx) | SPLIT_PART(str, delim, idx) | Passthrough (same in DuckDB) |
For a comprehensive guide with examples, see the Function Reference.
Type Mapping
| Snowflake Type | DuckDB Type | Notes |
|---|---|---|
VARIANT | JSON | Semi-structured data |
OBJECT | JSON | Key-value objects |
ARRAY (untyped) | JSON | Snowflake untyped arrays |
NUMBER | NUMERIC | Arbitrary-precision decimal |
TIMESTAMP_NTZ | TIMESTAMP | Timestamp without timezone |
TIMESTAMP_LTZ | TIMESTAMPTZ | Timestamp with timezone |
INTEGER | INTEGER | Passthrough |
VARCHAR | VARCHAR | Passthrough |
BOOLEAN | BOOLEAN | Passthrough |
FLOAT | FLOAT | Passthrough |
DOUBLE | DOUBLE | Passthrough |
DATE | DATE | Passthrough |
DECIMAL | DECIMAL | Passthrough |
For a comprehensive guide with precision handling details, see the Data Types reference.
Supported Statement Types
| Statement | Status |
|---|---|
SELECT (with subqueries, CTEs, window functions) | Fully supported |
INSERT INTO ... VALUES | Fully supported |
INSERT INTO ... SELECT | Fully supported |
CREATE TABLE | Fully supported (with type mapping) |
CREATE TABLE AS SELECT | Fully supported |
ALTER TABLE ADD COLUMN | Supported (with type mapping) |
DROP TABLE | Fully supported |
UPDATE | Fully supported |
DELETE | Fully supported |
EXPLAIN | Fully supported |
Snowflake Constructs
Colon Notation (JSON Access)
Snowflake's colon notation for semi-structured data is pre-parsed before the SQL parser runs:
-- Snowflake
SELECT payload:user:name FROM events;
-- Transpiled to DuckDB
SELECT json_extract_string(json_extract_string(payload, 'user'), 'name') FROM events;
FLATTEN
Snowflake's FLATTEN is rewritten to DuckDB's UNNEST. The transpiler extracts the input named parameter:
-- Snowflake
SELECT f.value FROM events, TABLE(FLATTEN(input => payload:items)) f;
-- Transpiled to DuckDB
SELECT f.value FROM events, UNNEST(json_extract(payload, 'items')) AS f;
CLONE
Snowflake's zero-copy CLONE is detected and flagged. Manual conversion to CREATE TABLE AS SELECT may be needed:
-- Snowflake
CREATE TABLE orders_backup CLONE orders;
-- Manual equivalent in HatiData
CREATE TABLE orders_backup AS SELECT * FROM orders;
For the complete list of construct transformations, see SQL Constructs.
Unsupported Constructs
The following Snowflake-specific constructs have no DuckDB equivalent and will return an error:
| Construct | Reason | Alternative |
|---|---|---|
COPY INTO | Snowflake-specific data loading | Use DuckDB COPY, read_csv(), read_parquet() |
Staged file refs (@stage, $1, $2) | Snowflake stages not supported | Use read_csv(), read_parquet(), read_json() |
STREAMS | Change data capture is Snowflake-specific | Use audit log queries |
TASKS | Scheduled SQL is Snowflake-specific | Use external schedulers (cron, Airflow) |
PIPES | Continuous data loading is Snowflake-specific | Use INSERT INTO ... SELECT from file functions |
Transpilation Cache
Transpilation results are cached by SQL hash using a concurrent DashMap in-memory cache. The cache key is a 64-bit hash of the input SQL string computed with Rust's DefaultHasher.
Repeated queries skip the entire parse-and-rewrite pipeline, reducing latency to sub-millisecond for cached queries. Each TranspilationResult includes a cache_hit boolean so callers can distinguish fresh transpilations from cached results.
You can monitor and manage the cache:
- Cache size is available via the
cache_len()method - Cache clearing is available via the
clear_cache()method - Cache entries persist for the lifetime of the
Transpilerinstance
AI Healing
If a transpiled query fails at execution, HatiData can optionally call an external AI API with the error message and original SQL. The AI suggests a corrected query, which is retried automatically. This is step 10 of the 13-step query pipeline in hatidata-proxy.
AI healing is disabled by default and requires configuring the HATIDATA_AI_HEALER_ENDPOINT environment variable.
Next Steps
- Function Reference -- Detailed function mapping with examples
- Data Types -- Complete type mapping reference
- SQL Constructs -- Snowflake construct transformations
- Python SDK -- Run queries from Python agents
- dbt Adapter -- Run dbt models against HatiData
- Quickstart -- Try these features hands-on