SQL Functions & Types
HatiData accepts Snowflake-compatible SQL and automatically transpiles it for execution. Transpiled results are cached by SQL hash, making repeated queries sub-millisecond on cache hits.
Function Mapping
Snowflake functions are automatically rewritten during transpilation. Rewrites apply to all SQL contexts: SELECT projections, WHERE clauses, HAVING clauses, GROUP BY, and ORDER BY — including nested combinations.
Conditional Functions
| Snowflake | HatiData | Notes |
|---|---|---|
IFF(cond, true_val, false_val) | IF(cond, true_val, false_val) | Direct rename |
NVL(expr, default) | COALESCE(expr, default) | Direct rename |
NVL2(expr, not_null, null_val) | CASE WHEN expr IS NOT NULL THEN not_null ELSE null_val END | Expanded to CASE |
COALESCE(...) | COALESCE(...) | Passthrough |
DECODE(expr, s1, r1, ...) | DECODE(...) | Passthrough (native) |
-- Snowflake
SELECT IFF(revenue > 10000, 'enterprise', 'smb') AS tier,
NVL(middle_name, '') AS middle_name
FROM customers;
-- HatiData output
SELECT IF(revenue > 10000, 'enterprise', 'smb') AS tier,
COALESCE(middle_name, '') AS middle_name
FROM customers;
Nested functions are recursively rewritten:
-- Snowflake: nested
SELECT IFF(NVL(status, 'unknown') = 'active', DATEDIFF(day, created_at, CURRENT_DATE), 0)
FROM users;
-- HatiData output: all three rewritten
SELECT IF(COALESCE(status, 'unknown') = 'active', DATE_DIFF('day', created_at, CURRENT_DATE), 0)
FROM users;
Date and Time Functions
| Snowflake | HatiData | Notes |
|---|---|---|
DATEDIFF(part, start, end) | DATE_DIFF(part, start, end) | Direct rename (underscore added) |
DATEADD(part, n, date) | date + INTERVAL 'n' part | Rewritten to interval arithmetic |
TO_TIMESTAMP_NTZ(expr) | CAST(expr AS TIMESTAMP) | Rewritten to CAST |
DATE_TRUNC('part', expr) | DATE_TRUNC('part', expr) | Passthrough |
-- Snowflake
SELECT DATEDIFF(day, created_at, CURRENT_DATE) AS age_days,
DATEADD(day, 30, order_date) AS due_date
FROM orders;
-- HatiData output
SELECT DATE_DIFF('day', created_at, CURRENT_DATE) AS age_days,
order_date + INTERVAL '30' DAY AS due_date
FROM orders;
String Functions
| Snowflake | HatiData | Notes |
|---|---|---|
SPLIT_PART(str, delim, n) | SPLIT_PART(str, delim, n) | Passthrough |
REGEXP_REPLACE(str, pat, rep) | REGEXP_REPLACE(str, pat, rep) | Passthrough |
Aggregate Functions
| Snowflake | HatiData | Notes |
|---|---|---|
LISTAGG(expr, delim) | STRING_AGG(expr, delim) | Direct rename |
ARRAY_AGG(expr) | LIST(expr) | Direct rename |
-- Snowflake
SELECT department,
LISTAGG(name, ', ') WITHIN GROUP (ORDER BY name) AS members,
ARRAY_AGG(DISTINCT product_id) AS products
FROM employees
GROUP BY department;
-- HatiData output
SELECT department,
STRING_AGG(name, ', ' ORDER BY name) AS members,
LIST(DISTINCT product_id) AS products
FROM employees
GROUP BY department;
Both LISTAGG and ARRAY_AGG forms (with and without WITHIN GROUP) are fully supported.
JSON Functions
| Snowflake | HatiData | Notes |
|---|---|---|
PARSE_JSON(expr) | CAST(expr AS JSON) | Rewritten to CAST |
TO_VARIANT(expr) | CAST(expr AS JSON) | Rewritten to CAST |
OBJECT_CONSTRUCT(k, v, ...) | json_object(k, v, ...) | Direct rename |
GET_PATH(expr, 'path') | json_extract(expr, 'path') | Direct rename |
FLATTEN(input => expr) | UNNEST(expr) | Extracts input parameter |
-- Snowflake
SELECT OBJECT_CONSTRUCT('name', name, 'email', email) AS user_json,
GET_PATH(metadata, 'preferences.theme') AS theme,
PARSE_JSON('{"x": 1}') AS data
FROM users;
-- HatiData output
SELECT json_object('name', name, 'email', email) AS user_json,
json_extract(metadata, 'preferences.theme') AS theme,
CAST('{"x": 1}' AS JSON) AS data
FROM users;
Type Conversion Functions
| Snowflake | HatiData | Notes |
|---|---|---|
TRY_CAST(expr AS type) | TRY_CAST(expr AS type) | Passthrough |
TO_NUMBER(expr) | CAST(expr AS NUMERIC) | Manual rewrite needed |
TO_VARCHAR(expr) | CAST(expr AS VARCHAR) | Manual rewrite needed |
Type Mapping
Type rewrites are applied automatically during transpilation. They affect CREATE TABLE column definitions, ALTER TABLE ADD COLUMN statements, and CAST expressions.
Rewritten Types
| Snowflake Type | HatiData Type | Category | Notes |
|---|---|---|---|
VARIANT | JSON | Semi-structured | Universal semi-structured type |
OBJECT | JSON | Semi-structured | Key-value object type |
ARRAY (untyped) | JSON | Semi-structured | Typed ARRAY(T) is preserved |
NUMBER | NUMERIC | Numeric | NUMBER(p,s) → NUMERIC (precision dropped) |
TIMESTAMP_NTZ | TIMESTAMP | Date/Time | Timestamp without timezone |
TIMESTAMP_LTZ | TIMESTAMPTZ | Date/Time | Timestamp with local timezone |
-- Snowflake CREATE TABLE
CREATE TABLE events (
id INTEGER,
payload VARIANT,
metadata OBJECT,
tags ARRAY,
amount NUMBER,
created_at TIMESTAMP_NTZ,
updated_at TIMESTAMP_LTZ
);
-- HatiData output
CREATE TABLE events (
id INTEGER,
payload JSON,
metadata JSON,
tags JSON,
amount NUMERIC,
created_at TIMESTAMP,
updated_at TIMESTAMPTZ
);
Passthrough Types
These types are identical in Snowflake and HatiData and pass through unchanged:
| Type | Category |
|---|---|
INTEGER, INT, BIGINT, SMALLINT, TINYINT | Numeric |
FLOAT, DOUBLE, DECIMAL(p,s) | Numeric |
VARCHAR(n), CHAR, TEXT | String |
BOOLEAN | Boolean |
DATE, TIME, TIMESTAMP | Date/Time |
BLOB, BYTEA | Binary |
Precision notes:
NUMBER(p,s)precision is simplified toNUMERIC(arbitrary precision). UseDECIMAL(p,s)if explicit enforcement is required.VARCHAR(n)length constraints are preserved.- Snowflake's
TIMESTAMP_NTZ(9)nanosecond precision is not carried over (microsecond precision).
SQL Constructs
These are structural transformations beyond simple function renaming, applied automatically during transpilation.
Colon Notation (JSON Field Access)
Snowflake's col:field syntax is rewritten during pre-parse normalization, before the SQL parser runs.
-- Single-level access
-- Snowflake: SELECT payload:action FROM events;
-- HatiData: SELECT json_extract_string(payload, 'action') FROM events;
-- Multi-level access
-- Snowflake: SELECT payload:user:address:city FROM events;
-- HatiData: SELECT json_extract_string(json_extract_string(json_extract_string(payload, 'user'), 'address'), 'city') FROM events;
-- With type casting
-- Snowflake: SELECT payload:amount::FLOAT AS amount FROM events;
-- HatiData: SELECT CAST(json_extract_string(payload, 'amount') AS FLOAT) AS amount FROM events;
The pre-parser distinguishes single colons (JSON access) from double colons (PostgreSQL-style cast). In rare cases involving complex string literals with colons, use json_extract_string() directly.
FLATTEN to UNNEST
-- Basic
-- Snowflake:
SELECT value FROM TABLE(FLATTEN(input => ARRAY_CONSTRUCT(1, 2, 3)));
-- HatiData:
SELECT value FROM UNNEST(ARRAY_CONSTRUCT(1, 2, 3));
-- With table join
-- Snowflake:
SELECT e.event_id, f.value AS item FROM events e, TABLE(FLATTEN(input => e.items)) f;
-- HatiData:
SELECT e.event_id, f.value AS item FROM events e, UNNEST(e.items) AS f;
FLATTEN parameters path, outer, recursive, and mode are not supported. Pre-extract with GET_PATH and use LEFT JOIN UNNEST for outer behavior.
OBJECT_CONSTRUCT to json_object
-- Snowflake:
SELECT OBJECT_CONSTRUCT('name', name, 'id', id) AS payload FROM users;
-- HatiData:
SELECT json_object('name', name, 'id', id) AS payload FROM users;
GET_PATH to json_extract
-- Snowflake:
SELECT GET_PATH(payload, 'user.preferences.theme') AS theme FROM events;
-- HatiData:
SELECT json_extract(payload, 'user.preferences.theme') AS theme FROM events;
Note: For complex paths with array indexing, JSON Pointer syntax may differ from Snowflake's dot notation.
CLONE — Manual Conversion Required
Snowflake's zero-copy CLONE is not supported. The transpiler logs a warning; manual conversion is needed:
-- Snowflake (NOT SUPPORTED):
CREATE TABLE orders_backup CLONE orders;
-- HatiData equivalent (full data copy):
CREATE TABLE orders_backup AS SELECT * FROM orders;
Supported SQL Features
| Feature | Support |
|---|---|
SELECT with subqueries | Full |
| Common Table Expressions (CTEs) | Full |
Window functions (ROW_NUMBER, RANK, LAG, LEAD, etc.) | Full |
QUALIFY clause | Full (native) |
LATERAL joins | Full (native) |
INSERT INTO ... VALUES | Full |
INSERT INTO ... SELECT | Full |
CREATE TABLE | Full (with type mapping) |
CREATE TABLE AS SELECT | Full |
ALTER TABLE ADD COLUMN | Full (with type mapping) |
DROP TABLE | Full |
UPDATE | Full |
DELETE | Full |
EXPLAIN | Full |
TABLESAMPLE / SAMPLE | Supported with syntax differences |
QUALIFY Example
-- Works in both Snowflake and HatiData
SELECT customer_id, order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
FROM orders
QUALIFY rn = 1;
LATERAL Join Example
-- Works in both Snowflake and HatiData
SELECT c.customer_id, c.name, recent.order_date, recent.total
FROM customers c,
LATERAL (
SELECT order_date, total FROM orders o
WHERE o.customer_id = c.customer_id
ORDER BY order_date DESC LIMIT 3
) recent;
Unsupported Constructs
The following Snowflake constructs are rejected with a TranspilerError::UnsupportedConstruct error:
| Construct | Reason | Alternative |
|---|---|---|
COPY INTO | Snowflake-specific data loading | COPY my_table FROM 'file.csv' or read_parquet() |
Staged file refs (@stage, $1) | Snowflake stages not supported | read_csv(), read_parquet(), read_json() |
STREAMS | Change data capture (CDC) | Use audit log queries |
TASKS | Scheduled SQL execution | Use cron, Airflow, or Prefect |
PIPES | Continuous data loading | INSERT INTO ... SELECT from file functions |
Hybrid SQL Functions
HatiData extends standard SQL with vector-aware functions for semantic search over structured data and agent memory. These are resolved at query time: the proxy embeds the search text and evaluates semantic similarity against the vector index.
semantic_match()
Returns TRUE if cosine similarity exceeds a threshold. Use in WHERE and JOIN ON clauses.
-- Filter rows semantically
SELECT ticket_id, subject, description
FROM support_tickets
WHERE semantic_match(description, 'billing dispute refund request', 0.8)
AND status = 'open';
-- In a JOIN ON condition
SELECT o.order_id, m.content AS related_memory
FROM orders o
JOIN _hatidata_agent_memory m
ON semantic_match(m.content, CAST(o.description AS VARCHAR), 0.75);
Signature: semantic_match(column, 'search text' [, threshold FLOAT = 0.7])
vector_match() is an alias for semantic_match.
semantic_rank()
Returns the cosine similarity score as a FLOAT (0.0 to 1.0). Use in SELECT and ORDER BY to rank by relevance.
SELECT content,
semantic_rank(content, 'revenue growth Q4') AS relevance
FROM _hatidata_agent_memory
WHERE agent_id = 'analyst'
ORDER BY relevance DESC
LIMIT 10;
JOIN_VECTOR
Syntactic sugar combining a JOIN with semantic_match for readable intent:
SELECT t.ticket_id, t.subject,
m.content AS related_memory,
semantic_rank(m.content, t.subject) AS relevance
FROM support_tickets t
JOIN_VECTOR _hatidata_agent_memory m
ON semantic_match(m.content, t.subject, 0.7)
WHERE t.status = 'open'
ORDER BY relevance DESC;
Threshold Tuning
| Threshold | Precision | Recall | Best For |
|---|---|---|---|
| 0.9+ | Very high | Low | Exact semantic match, deduplication |
| 0.8 | High | Medium | Targeted retrieval, specific facts |
| 0.7 (default) | Medium | Medium-High | General search, RAG context |
| 0.6 | Low-Medium | High | Exploratory search, broad context |
| 0.5 | Low | Very high | Catch-all, brainstorming |
Vector functions require an embedding provider configured on the proxy. Without one, queries using these functions return an error.
Transpilation Cache and AI Healing
Transpilation results are cached by SQL hash. Cache hits skip the entire parse-and-rewrite pipeline — latency drops to sub-millisecond for repeated queries.
If a transpiled query fails at execution, AI healing can call an external AI API with the error and original SQL to suggest a corrected query for automatic retry. AI healing is disabled by default; enable it with the HATIDATA_HEALER_API_ENDPOINT environment variable.