Skip to main content

SQL Compatibility

HatiData accepts Snowflake-compatible SQL and automatically transpiles it to DuckDB. The transpilation pipeline runs in five stages, producing DuckDB-compatible output with a confidence score indicating transpilation reliability.

Transpilation Pipeline

Every SQL statement passes through these stages in order:

Stage 1: Pre-parse (Colon Notation)

Snowflake colon notation (col:field) is rewritten to json_extract_string(col, 'field') before the SQL parser runs. This is necessary because sqlparser does not understand Snowflake's proprietary colon access syntax. The pre-parser uses regex matching and carefully avoids interfering with PostgreSQL-style :: cast operators.

-- Input
SELECT payload:user:name FROM events;

-- After pre-parse
SELECT json_extract_string(json_extract_string(payload, 'user'), 'name') FROM events;

Stage 1.5: Vector Join Pre-parse

If the SQL contains hybrid SQL keywords (JOIN_VECTOR, semantic_match, vector_match, semantic_rank), a second pre-parse stage rewrites them before parsing. This is necessary because sqlparser does not understand HatiData's vector join syntax.

-- Input
SELECT t.id, k.title
FROM tickets t
JOIN_VECTOR knowledge_base k ON semantic_match(k.embedding, t.subject)
WHERE semantic_rank(k.embedding, t.subject) > 0.5;

-- After vector pre-parse and embedding resolution
SELECT t.id, k.title
FROM tickets t
INNER JOIN knowledge_base k ON list_cosine_similarity(k.embedding, [0.12, -0.34, ...]) > 0.7
WHERE list_cosine_similarity(k.embedding, [0.12, -0.34, ...]) > 0.5;

The proxy sends the search text to the configured embedding provider and replaces semantic_match / semantic_rank calls with list_cosine_similarity() using the resolved embedding vectors. See Hybrid Search for the full reference.

Stage 2: Parse

The pre-processed SQL is parsed using sqlparser's Snowflake dialect, producing an abstract syntax tree (AST). Parse errors are returned immediately as TranspilerError::ParseError.

Stage 3: Function Rewrites

Snowflake-specific functions are mapped to their DuckDB equivalents by walking the AST. See Function Reference for the complete mapping table.

Stage 4: Type Rewrites

Snowflake data types in CREATE TABLE statements, ALTER TABLE ADD COLUMN, and CAST expressions are mapped to DuckDB types. See Data Types for the complete mapping table.

Stage 5: Construct Rewrites

Snowflake-specific constructs such as FLATTEN, OBJECT_CONSTRUCT, GET_PATH, and CLONE are transformed to DuckDB equivalents. Unsupported constructs like COPY INTO and staged file references are rejected with descriptive error messages. See SQL Constructs for details.

After all five stages, the rewritten AST is rendered back to a SQL string and cached.

Confidence Score

Each transpilation result includes a confidence score between 0.0 and 1.0:

  • 1.0 -- All rewrites applied cleanly, no warnings
  • 0.9 -- A function or type rewrite produced a warning but did not fail
  • < 0.9 -- Multiple rewrite warnings; review the output carefully

The confidence score decreases by 0.1 for each function or type rewrite warning. Snowflake construct errors (e.g., COPY INTO) are hard failures and do not produce a reduced confidence score -- they return an error instead.

Function Mapping

Snowflake FunctionDuckDB EquivalentNotes
IFF(cond, true, false)IF(cond, true, false)Direct rename
NVL(expr, default)COALESCE(expr, default)Direct rename
NVL2(expr, not_null, null)CASE WHEN expr IS NOT NULL THEN not_null ELSE null ENDExpanded to CASE
DATEDIFF(part, start, end)DATE_DIFF(part, start, end)Direct rename
DATEADD(part, amount, date)date + INTERVAL 'amount' partRewritten to interval arithmetic
LISTAGG(expr, delimiter)STRING_AGG(expr, delimiter)Direct rename
ARRAY_AGG(expr)LIST(expr)Direct rename
PARSE_JSON(expr)CAST(expr AS JSON)Rewritten to CAST
TO_TIMESTAMP_NTZ(expr)CAST(expr AS TIMESTAMP)Rewritten to CAST
TO_VARIANT(expr)CAST(expr AS JSON)Rewritten to CAST
OBJECT_CONSTRUCT(k, v, ...)json_object(k, v, ...)Renamed
GET_PATH(expr, path)json_extract(expr, path)Renamed
FLATTEN(input => expr)UNNEST(expr)Extracts input parameter
SPLIT_PART(str, delim, idx)SPLIT_PART(str, delim, idx)Passthrough (same in DuckDB)

For a comprehensive guide with examples, see the Function Reference.

Type Mapping

Snowflake TypeDuckDB TypeNotes
VARIANTJSONSemi-structured data
OBJECTJSONKey-value objects
ARRAY (untyped)JSONSnowflake untyped arrays
NUMBERNUMERICArbitrary-precision decimal
TIMESTAMP_NTZTIMESTAMPTimestamp without timezone
TIMESTAMP_LTZTIMESTAMPTZTimestamp with timezone
INTEGERINTEGERPassthrough
VARCHARVARCHARPassthrough
BOOLEANBOOLEANPassthrough
FLOATFLOATPassthrough
DOUBLEDOUBLEPassthrough
DATEDATEPassthrough
DECIMALDECIMALPassthrough

For a comprehensive guide with precision handling details, see the Data Types reference.

Supported Statement Types

StatementStatus
SELECT (with subqueries, CTEs, window functions)Fully supported
INSERT INTO ... VALUESFully supported
INSERT INTO ... SELECTFully supported
CREATE TABLEFully supported (with type mapping)
CREATE TABLE AS SELECTFully supported
ALTER TABLE ADD COLUMNSupported (with type mapping)
DROP TABLEFully supported
UPDATEFully supported
DELETEFully supported
EXPLAINFully supported

Snowflake Constructs

Colon Notation (JSON Access)

Snowflake's colon notation for semi-structured data is pre-parsed before the SQL parser runs:

-- Snowflake
SELECT payload:user:name FROM events;

-- Transpiled to DuckDB
SELECT json_extract_string(json_extract_string(payload, 'user'), 'name') FROM events;

FLATTEN

Snowflake's FLATTEN is rewritten to DuckDB's UNNEST. The transpiler extracts the input named parameter:

-- Snowflake
SELECT f.value FROM events, TABLE(FLATTEN(input => payload:items)) f;

-- Transpiled to DuckDB
SELECT f.value FROM events, UNNEST(json_extract(payload, 'items')) AS f;

CLONE

Snowflake's zero-copy CLONE is detected and flagged. Manual conversion to CREATE TABLE AS SELECT may be needed:

-- Snowflake
CREATE TABLE orders_backup CLONE orders;

-- Manual equivalent in HatiData
CREATE TABLE orders_backup AS SELECT * FROM orders;

For the complete list of construct transformations, see SQL Constructs.

Unsupported Constructs

The following Snowflake-specific constructs have no DuckDB equivalent and will return an error:

ConstructReasonAlternative
COPY INTOSnowflake-specific data loadingUse DuckDB COPY, read_csv(), read_parquet()
Staged file refs (@stage, $1, $2)Snowflake stages not supportedUse read_csv(), read_parquet(), read_json()
STREAMSChange data capture is Snowflake-specificUse audit log queries
TASKSScheduled SQL is Snowflake-specificUse external schedulers (cron, Airflow)
PIPESContinuous data loading is Snowflake-specificUse INSERT INTO ... SELECT from file functions

Transpilation Cache

Transpilation results are cached by SQL hash using a concurrent DashMap in-memory cache. The cache key is a 64-bit hash of the input SQL string computed with Rust's DefaultHasher.

Repeated queries skip the entire parse-and-rewrite pipeline, reducing latency to sub-millisecond for cached queries. Each TranspilationResult includes a cache_hit boolean so callers can distinguish fresh transpilations from cached results.

You can monitor and manage the cache:

  • Cache size is available via the cache_len() method
  • Cache clearing is available via the clear_cache() method
  • Cache entries persist for the lifetime of the Transpiler instance

AI Healing

If a transpiled query fails at execution, HatiData can optionally call an external AI API with the error message and original SQL. The AI suggests a corrected query, which is retried automatically. This is step 10 of the 13-step query pipeline in hatidata-proxy.

AI healing is disabled by default and requires configuring the HATIDATA_AI_HEALER_ENDPOINT environment variable.

Next Steps

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.