Skip to main content

SQL Functions & Types

HatiData accepts Snowflake-compatible SQL and automatically transpiles it for execution. Transpiled results are cached by SQL hash, making repeated queries sub-millisecond on cache hits.


Function Mapping

Snowflake functions are automatically rewritten during transpilation. Rewrites apply to all SQL contexts: SELECT projections, WHERE clauses, HAVING clauses, GROUP BY, and ORDER BY — including nested combinations.

Conditional Functions

SnowflakeHatiDataNotes
IFF(cond, true_val, false_val)IF(cond, true_val, false_val)Direct rename
NVL(expr, default)COALESCE(expr, default)Direct rename
NVL2(expr, not_null, null_val)CASE WHEN expr IS NOT NULL THEN not_null ELSE null_val ENDExpanded to CASE
COALESCE(...)COALESCE(...)Passthrough
DECODE(expr, s1, r1, ...)DECODE(...)Passthrough (native)
-- Snowflake
SELECT IFF(revenue > 10000, 'enterprise', 'smb') AS tier,
NVL(middle_name, '') AS middle_name
FROM customers;

-- HatiData output
SELECT IF(revenue > 10000, 'enterprise', 'smb') AS tier,
COALESCE(middle_name, '') AS middle_name
FROM customers;

Nested functions are recursively rewritten:

-- Snowflake: nested
SELECT IFF(NVL(status, 'unknown') = 'active', DATEDIFF(day, created_at, CURRENT_DATE), 0)
FROM users;

-- HatiData output: all three rewritten
SELECT IF(COALESCE(status, 'unknown') = 'active', DATE_DIFF('day', created_at, CURRENT_DATE), 0)
FROM users;

Date and Time Functions

SnowflakeHatiDataNotes
DATEDIFF(part, start, end)DATE_DIFF(part, start, end)Direct rename (underscore added)
DATEADD(part, n, date)date + INTERVAL 'n' partRewritten to interval arithmetic
TO_TIMESTAMP_NTZ(expr)CAST(expr AS TIMESTAMP)Rewritten to CAST
DATE_TRUNC('part', expr)DATE_TRUNC('part', expr)Passthrough
-- Snowflake
SELECT DATEDIFF(day, created_at, CURRENT_DATE) AS age_days,
DATEADD(day, 30, order_date) AS due_date
FROM orders;

-- HatiData output
SELECT DATE_DIFF('day', created_at, CURRENT_DATE) AS age_days,
order_date + INTERVAL '30' DAY AS due_date
FROM orders;

String Functions

SnowflakeHatiDataNotes
SPLIT_PART(str, delim, n)SPLIT_PART(str, delim, n)Passthrough
REGEXP_REPLACE(str, pat, rep)REGEXP_REPLACE(str, pat, rep)Passthrough

Aggregate Functions

SnowflakeHatiDataNotes
LISTAGG(expr, delim)STRING_AGG(expr, delim)Direct rename
ARRAY_AGG(expr)LIST(expr)Direct rename
-- Snowflake
SELECT department,
LISTAGG(name, ', ') WITHIN GROUP (ORDER BY name) AS members,
ARRAY_AGG(DISTINCT product_id) AS products
FROM employees
GROUP BY department;

-- HatiData output
SELECT department,
STRING_AGG(name, ', ' ORDER BY name) AS members,
LIST(DISTINCT product_id) AS products
FROM employees
GROUP BY department;

Both LISTAGG and ARRAY_AGG forms (with and without WITHIN GROUP) are fully supported.

JSON Functions

SnowflakeHatiDataNotes
PARSE_JSON(expr)CAST(expr AS JSON)Rewritten to CAST
TO_VARIANT(expr)CAST(expr AS JSON)Rewritten to CAST
OBJECT_CONSTRUCT(k, v, ...)json_object(k, v, ...)Direct rename
GET_PATH(expr, 'path')json_extract(expr, 'path')Direct rename
FLATTEN(input => expr)UNNEST(expr)Extracts input parameter
-- Snowflake
SELECT OBJECT_CONSTRUCT('name', name, 'email', email) AS user_json,
GET_PATH(metadata, 'preferences.theme') AS theme,
PARSE_JSON('{"x": 1}') AS data
FROM users;

-- HatiData output
SELECT json_object('name', name, 'email', email) AS user_json,
json_extract(metadata, 'preferences.theme') AS theme,
CAST('{"x": 1}' AS JSON) AS data
FROM users;

Type Conversion Functions

SnowflakeHatiDataNotes
TRY_CAST(expr AS type)TRY_CAST(expr AS type)Passthrough
TO_NUMBER(expr)CAST(expr AS NUMERIC)Manual rewrite needed
TO_VARCHAR(expr)CAST(expr AS VARCHAR)Manual rewrite needed

Type Mapping

Type rewrites are applied automatically during transpilation. They affect CREATE TABLE column definitions, ALTER TABLE ADD COLUMN statements, and CAST expressions.

Rewritten Types

Snowflake TypeHatiData TypeCategoryNotes
VARIANTJSONSemi-structuredUniversal semi-structured type
OBJECTJSONSemi-structuredKey-value object type
ARRAY (untyped)JSONSemi-structuredTyped ARRAY(T) is preserved
NUMBERNUMERICNumericNUMBER(p,s)NUMERIC (precision dropped)
TIMESTAMP_NTZTIMESTAMPDate/TimeTimestamp without timezone
TIMESTAMP_LTZTIMESTAMPTZDate/TimeTimestamp with local timezone
-- Snowflake CREATE TABLE
CREATE TABLE events (
id INTEGER,
payload VARIANT,
metadata OBJECT,
tags ARRAY,
amount NUMBER,
created_at TIMESTAMP_NTZ,
updated_at TIMESTAMP_LTZ
);

-- HatiData output
CREATE TABLE events (
id INTEGER,
payload JSON,
metadata JSON,
tags JSON,
amount NUMERIC,
created_at TIMESTAMP,
updated_at TIMESTAMPTZ
);

Passthrough Types

These types are identical in Snowflake and HatiData and pass through unchanged:

TypeCategory
INTEGER, INT, BIGINT, SMALLINT, TINYINTNumeric
FLOAT, DOUBLE, DECIMAL(p,s)Numeric
VARCHAR(n), CHAR, TEXTString
BOOLEANBoolean
DATE, TIME, TIMESTAMPDate/Time
BLOB, BYTEABinary

Precision notes:

  • NUMBER(p,s) precision is simplified to NUMERIC (arbitrary precision). Use DECIMAL(p,s) if explicit enforcement is required.
  • VARCHAR(n) length constraints are preserved.
  • Snowflake's TIMESTAMP_NTZ(9) nanosecond precision is not carried over (microsecond precision).

SQL Constructs

These are structural transformations beyond simple function renaming, applied automatically during transpilation.

Colon Notation (JSON Field Access)

Snowflake's col:field syntax is rewritten during pre-parse normalization, before the SQL parser runs.

-- Single-level access
-- Snowflake: SELECT payload:action FROM events;
-- HatiData: SELECT json_extract_string(payload, 'action') FROM events;

-- Multi-level access
-- Snowflake: SELECT payload:user:address:city FROM events;
-- HatiData: SELECT json_extract_string(json_extract_string(json_extract_string(payload, 'user'), 'address'), 'city') FROM events;

-- With type casting
-- Snowflake: SELECT payload:amount::FLOAT AS amount FROM events;
-- HatiData: SELECT CAST(json_extract_string(payload, 'amount') AS FLOAT) AS amount FROM events;

The pre-parser distinguishes single colons (JSON access) from double colons (PostgreSQL-style cast). In rare cases involving complex string literals with colons, use json_extract_string() directly.

FLATTEN to UNNEST

-- Basic
-- Snowflake:
SELECT value FROM TABLE(FLATTEN(input => ARRAY_CONSTRUCT(1, 2, 3)));
-- HatiData:
SELECT value FROM UNNEST(ARRAY_CONSTRUCT(1, 2, 3));

-- With table join
-- Snowflake:
SELECT e.event_id, f.value AS item FROM events e, TABLE(FLATTEN(input => e.items)) f;
-- HatiData:
SELECT e.event_id, f.value AS item FROM events e, UNNEST(e.items) AS f;

FLATTEN parameters path, outer, recursive, and mode are not supported. Pre-extract with GET_PATH and use LEFT JOIN UNNEST for outer behavior.

OBJECT_CONSTRUCT to json_object

-- Snowflake:
SELECT OBJECT_CONSTRUCT('name', name, 'id', id) AS payload FROM users;
-- HatiData:
SELECT json_object('name', name, 'id', id) AS payload FROM users;

GET_PATH to json_extract

-- Snowflake:
SELECT GET_PATH(payload, 'user.preferences.theme') AS theme FROM events;
-- HatiData:
SELECT json_extract(payload, 'user.preferences.theme') AS theme FROM events;

Note: For complex paths with array indexing, JSON Pointer syntax may differ from Snowflake's dot notation.

CLONE — Manual Conversion Required

Snowflake's zero-copy CLONE is not supported. The transpiler logs a warning; manual conversion is needed:

-- Snowflake (NOT SUPPORTED):
CREATE TABLE orders_backup CLONE orders;

-- HatiData equivalent (full data copy):
CREATE TABLE orders_backup AS SELECT * FROM orders;

Supported SQL Features

FeatureSupport
SELECT with subqueriesFull
Common Table Expressions (CTEs)Full
Window functions (ROW_NUMBER, RANK, LAG, LEAD, etc.)Full
QUALIFY clauseFull (native)
LATERAL joinsFull (native)
INSERT INTO ... VALUESFull
INSERT INTO ... SELECTFull
CREATE TABLEFull (with type mapping)
CREATE TABLE AS SELECTFull
ALTER TABLE ADD COLUMNFull (with type mapping)
DROP TABLEFull
UPDATEFull
DELETEFull
EXPLAINFull
TABLESAMPLE / SAMPLESupported with syntax differences

QUALIFY Example

-- Works in both Snowflake and HatiData
SELECT customer_id, order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
FROM orders
QUALIFY rn = 1;

LATERAL Join Example

-- Works in both Snowflake and HatiData
SELECT c.customer_id, c.name, recent.order_date, recent.total
FROM customers c,
LATERAL (
SELECT order_date, total FROM orders o
WHERE o.customer_id = c.customer_id
ORDER BY order_date DESC LIMIT 3
) recent;

Unsupported Constructs

The following Snowflake constructs are rejected with a TranspilerError::UnsupportedConstruct error:

ConstructReasonAlternative
COPY INTOSnowflake-specific data loadingCOPY my_table FROM 'file.csv' or read_parquet()
Staged file refs (@stage, $1)Snowflake stages not supportedread_csv(), read_parquet(), read_json()
STREAMSChange data capture (CDC)Use audit log queries
TASKSScheduled SQL executionUse cron, Airflow, or Prefect
PIPESContinuous data loadingINSERT INTO ... SELECT from file functions

Hybrid SQL Functions

HatiData extends standard SQL with vector-aware functions for semantic search over structured data and agent memory. These are resolved at query time: the proxy embeds the search text and evaluates semantic similarity against the vector index.

semantic_match()

Returns TRUE if cosine similarity exceeds a threshold. Use in WHERE and JOIN ON clauses.

-- Filter rows semantically
SELECT ticket_id, subject, description
FROM support_tickets
WHERE semantic_match(description, 'billing dispute refund request', 0.8)
AND status = 'open';

-- In a JOIN ON condition
SELECT o.order_id, m.content AS related_memory
FROM orders o
JOIN _hatidata_agent_memory m
ON semantic_match(m.content, CAST(o.description AS VARCHAR), 0.75);

Signature: semantic_match(column, 'search text' [, threshold FLOAT = 0.7])

vector_match() is an alias for semantic_match.

semantic_rank()

Returns the cosine similarity score as a FLOAT (0.0 to 1.0). Use in SELECT and ORDER BY to rank by relevance.

SELECT content,
semantic_rank(content, 'revenue growth Q4') AS relevance
FROM _hatidata_agent_memory
WHERE agent_id = 'analyst'
ORDER BY relevance DESC
LIMIT 10;

JOIN_VECTOR

Syntactic sugar combining a JOIN with semantic_match for readable intent:

SELECT t.ticket_id, t.subject,
m.content AS related_memory,
semantic_rank(m.content, t.subject) AS relevance
FROM support_tickets t
JOIN_VECTOR _hatidata_agent_memory m
ON semantic_match(m.content, t.subject, 0.7)
WHERE t.status = 'open'
ORDER BY relevance DESC;

Threshold Tuning

ThresholdPrecisionRecallBest For
0.9+Very highLowExact semantic match, deduplication
0.8HighMediumTargeted retrieval, specific facts
0.7 (default)MediumMedium-HighGeneral search, RAG context
0.6Low-MediumHighExploratory search, broad context
0.5LowVery highCatch-all, brainstorming

Vector functions require an embedding provider configured on the proxy. Without one, queries using these functions return an error.


Transpilation Cache and AI Healing

Transpilation results are cached by SQL hash. Cache hits skip the entire parse-and-rewrite pipeline — latency drops to sub-millisecond for repeated queries.

If a transpiled query fails at execution, AI healing can call an external AI API with the error and original SQL to suggest a corrected query for automatic retry. AI healing is disabled by default; enable it with the HATIDATA_HEALER_API_ENDPOINT environment variable.

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.