SQL Compatibility

HatiData accepts Snowflake-compatible SQL and automatically transpiles it to DuckDB. The transpilation pipeline runs in five stages, producing DuckDB-compatible output with a confidence score indicating transpilation reliability.

Transpilation Pipeline

Every SQL statement passes through these stages in order:

Stage 1: Pre-parse (Colon Notation)

Snowflake colon notation (col:field) is rewritten to json_extract_string(col, 'field') before the SQL parser runs. This is necessary because sqlparser does not understand Snowflake's proprietary colon access syntax. The pre-parser uses regex matching and carefully avoids interfering with PostgreSQL-style :: cast operators.

-- Input
SELECT payload:user:name FROM events;

-- After pre-parse
SELECT json_extract_string(json_extract_string(payload, 'user'), 'name') FROM events;

Stage 1.5: Vector Join Pre-parse

If the SQL contains hybrid SQL keywords (JOIN_VECTOR, semantic_match, vector_match, semantic_rank), a second pre-parse stage rewrites them before parsing. This is necessary because sqlparser does not understand HatiData's vector join syntax.

-- Input
SELECT t.id, k.title
FROM tickets t
JOIN_VECTOR knowledge_base k ON semantic_match(k.embedding, t.subject)
WHERE semantic_rank(k.embedding, t.subject) > 0.5;

-- After vector pre-parse and embedding resolution
SELECT t.id, k.title
FROM tickets t
INNER JOIN knowledge_base k ON list_cosine_similarity(k.embedding, [0.12, -0.34, ...]) > 0.7
WHERE list_cosine_similarity(k.embedding, [0.12, -0.34, ...]) > 0.5;

The proxy sends the search text to the configured embedding provider and replaces semantic_match / semantic_rank calls with list_cosine_similarity() using the resolved embedding vectors. See Hybrid Search for the full reference.

Stage 2: Parse

The pre-processed SQL is parsed using sqlparser's Snowflake dialect, producing an abstract syntax tree (AST). Parse errors are returned immediately as TranspilerError::ParseError.

Stage 3: Function Rewrites

Snowflake-specific functions are mapped to their DuckDB equivalents by walking the AST. See Function Reference for the complete mapping table.

Stage 4: Type Rewrites

Snowflake data types in CREATE TABLE statements, ALTER TABLE ADD COLUMN, and CAST expressions are mapped to DuckDB types. See Data Types for the complete mapping table.

Stage 5: Construct Rewrites

Snowflake-specific constructs such as FLATTEN, OBJECT_CONSTRUCT, GET_PATH, and CLONE are transformed to DuckDB equivalents. Unsupported constructs like COPY INTO and staged file references are rejected with descriptive error messages. See SQL Constructs for details.

After all five stages, the rewritten AST is rendered back to a SQL string and cached.

Confidence Score

Each transpilation result includes a confidence score between 0.0 and 1.0:

1.0 -- All rewrites applied cleanly, no warnings
0.9 -- A function or type rewrite produced a warning but did not fail
< 0.9 -- Multiple rewrite warnings; review the output carefully

The confidence score decreases by 0.1 for each function or type rewrite warning. Snowflake construct errors (e.g., COPY INTO) are hard failures and do not produce a reduced confidence score -- they return an error instead.

Function Mapping

Snowflake Function	DuckDB Equivalent	Notes
`IFF(cond, true, false)`	`IF(cond, true, false)`	Direct rename
`NVL(expr, default)`	`COALESCE(expr, default)`	Direct rename
`NVL2(expr, not_null, null)`	`CASE WHEN expr IS NOT NULL THEN not_null ELSE null END`	Expanded to CASE
`DATEDIFF(part, start, end)`	`DATE_DIFF(part, start, end)`	Direct rename
`DATEADD(part, amount, date)`	`date + INTERVAL 'amount' part`	Rewritten to interval arithmetic
`LISTAGG(expr, delimiter)`	`STRING_AGG(expr, delimiter)`	Direct rename
`ARRAY_AGG(expr)`	`LIST(expr)`	Direct rename
`PARSE_JSON(expr)`	`CAST(expr AS JSON)`	Rewritten to CAST
`TO_TIMESTAMP_NTZ(expr)`	`CAST(expr AS TIMESTAMP)`	Rewritten to CAST
`TO_VARIANT(expr)`	`CAST(expr AS JSON)`	Rewritten to CAST
`OBJECT_CONSTRUCT(k, v, ...)`	`json_object(k, v, ...)`	Renamed
`GET_PATH(expr, path)`	`json_extract(expr, path)`	Renamed
`FLATTEN(input => expr)`	`UNNEST(expr)`	Extracts `input` parameter
`SPLIT_PART(str, delim, idx)`	`SPLIT_PART(str, delim, idx)`	Passthrough (same in DuckDB)

For a comprehensive guide with examples, see the Function Reference.

Type Mapping

Snowflake Type	DuckDB Type	Notes
`VARIANT`	`JSON`	Semi-structured data
`OBJECT`	`JSON`	Key-value objects
`ARRAY` (untyped)	`JSON`	Snowflake untyped arrays
`NUMBER`	`NUMERIC`	Arbitrary-precision decimal
`TIMESTAMP_NTZ`	`TIMESTAMP`	Timestamp without timezone
`TIMESTAMP_LTZ`	`TIMESTAMPTZ`	Timestamp with timezone
`INTEGER`	`INTEGER`	Passthrough
`VARCHAR`	`VARCHAR`	Passthrough
`BOOLEAN`	`BOOLEAN`	Passthrough
`FLOAT`	`FLOAT`	Passthrough
`DOUBLE`	`DOUBLE`	Passthrough
`DATE`	`DATE`	Passthrough
`DECIMAL`	`DECIMAL`	Passthrough

For a comprehensive guide with precision handling details, see the Data Types reference.

Supported Statement Types

Statement	Status
`SELECT` (with subqueries, CTEs, window functions)	Fully supported
`INSERT INTO ... VALUES`	Fully supported
`INSERT INTO ... SELECT`	Fully supported
`CREATE TABLE`	Fully supported (with type mapping)
`CREATE TABLE AS SELECT`	Fully supported
`ALTER TABLE ADD COLUMN`	Supported (with type mapping)
`DROP TABLE`	Fully supported
`UPDATE`	Fully supported
`DELETE`	Fully supported
`EXPLAIN`	Fully supported

Snowflake Constructs

Colon Notation (JSON Access)

Snowflake's colon notation for semi-structured data is pre-parsed before the SQL parser runs:

-- Snowflake
SELECT payload:user:name FROM events;

-- Transpiled to DuckDB
SELECT json_extract_string(json_extract_string(payload, 'user'), 'name') FROM events;

FLATTEN

Snowflake's FLATTEN is rewritten to DuckDB's UNNEST. The transpiler extracts the input named parameter:

-- Snowflake
SELECT f.value FROM events, TABLE(FLATTEN(input => payload:items)) f;

-- Transpiled to DuckDB
SELECT f.value FROM events, UNNEST(json_extract(payload, 'items')) AS f;

CLONE

Snowflake's zero-copy CLONE is detected and flagged. Manual conversion to CREATE TABLE AS SELECT may be needed:

-- Snowflake
CREATE TABLE orders_backup CLONE orders;

-- Manual equivalent in HatiData
CREATE TABLE orders_backup AS SELECT * FROM orders;

For the complete list of construct transformations, see SQL Constructs.

Unsupported Constructs

The following Snowflake-specific constructs have no DuckDB equivalent and will return an error:

Construct	Reason	Alternative
`COPY INTO`	Snowflake-specific data loading	Use DuckDB `COPY`, `read_csv()`, `read_parquet()`
Staged file refs (`@stage`, `$1`, `$2`)	Snowflake stages not supported	Use `read_csv()`, `read_parquet()`, `read_json()`
`STREAMS`	Change data capture is Snowflake-specific	Use audit log queries
`TASKS`	Scheduled SQL is Snowflake-specific	Use external schedulers (cron, Airflow)
`PIPES`	Continuous data loading is Snowflake-specific	Use `INSERT INTO ... SELECT` from file functions

Transpilation Cache

Transpilation results are cached by SQL hash using a concurrent DashMap in-memory cache. The cache key is a 64-bit hash of the input SQL string computed with Rust's DefaultHasher.

Repeated queries skip the entire parse-and-rewrite pipeline, reducing latency to sub-millisecond for cached queries. Each TranspilationResult includes a cache_hit boolean so callers can distinguish fresh transpilations from cached results.

You can monitor and manage the cache:

Cache size is available via the cache_len() method
Cache clearing is available via the clear_cache() method
Cache entries persist for the lifetime of the Transpiler instance

AI Healing

If a transpiled query fails at execution, HatiData can optionally call an external AI API with the error message and original SQL. The AI suggests a corrected query, which is retried automatically. This is step 10 of the 13-step query pipeline in hatidata-proxy.

AI healing is disabled by default and requires configuring the HATIDATA_AI_HEALER_ENDPOINT environment variable.

Next Steps

Function Reference -- Detailed function mapping with examples
Data Types -- Complete type mapping reference
SQL Constructs -- Snowflake construct transformations
Python SDK -- Run queries from Python agents
dbt Adapter -- Run dbt models against HatiData
Quickstart -- Try these features hands-on

Transpilation Pipeline​

Stage 1: Pre-parse (Colon Notation)​

Stage 1.5: Vector Join Pre-parse​

Stage 2: Parse​

Stage 3: Function Rewrites​

Stage 4: Type Rewrites​

Stage 5: Construct Rewrites​

Confidence Score​

Function Mapping​

Type Mapping​

Supported Statement Types​

Snowflake Constructs​

Colon Notation (JSON Access)​

FLATTEN​

CLONE​

Unsupported Constructs​

Transpilation Cache​

AI Healing​

Next Steps​

Stay in the loop