SQL Functions & Types

HatiData accepts Snowflake-compatible SQL and automatically transpiles it for execution. Transpiled results are cached by SQL hash, making repeated queries sub-millisecond on cache hits.

Function Mapping

Snowflake functions are automatically rewritten during transpilation. Rewrites apply to all SQL contexts: SELECT projections, WHERE clauses, HAVING clauses, GROUP BY, and ORDER BY — including nested combinations.

Conditional Functions

Snowflake	HatiData	Notes
`IFF(cond, true_val, false_val)`	`IF(cond, true_val, false_val)`	Direct rename
`NVL(expr, default)`	`COALESCE(expr, default)`	Direct rename
`NVL2(expr, not_null, null_val)`	`CASE WHEN expr IS NOT NULL THEN not_null ELSE null_val END`	Expanded to CASE
`COALESCE(...)`	`COALESCE(...)`	Passthrough
`DECODE(expr, s1, r1, ...)`	`DECODE(...)`	Passthrough (native)

-- Snowflake
SELECT IFF(revenue > 10000, 'enterprise', 'smb') AS tier,
       NVL(middle_name, '') AS middle_name
FROM customers;

-- HatiData output
SELECT IF(revenue > 10000, 'enterprise', 'smb') AS tier,
       COALESCE(middle_name, '') AS middle_name
FROM customers;

Nested functions are recursively rewritten:

-- Snowflake: nested
SELECT IFF(NVL(status, 'unknown') = 'active', DATEDIFF(day, created_at, CURRENT_DATE), 0)
FROM users;

-- HatiData output: all three rewritten
SELECT IF(COALESCE(status, 'unknown') = 'active', DATE_DIFF('day', created_at, CURRENT_DATE), 0)
FROM users;

Date and Time Functions

Snowflake	HatiData	Notes
`DATEDIFF(part, start, end)`	`DATE_DIFF(part, start, end)`	Direct rename (underscore added)
`DATEADD(part, n, date)`	`date + INTERVAL 'n' part`	Rewritten to interval arithmetic
`TO_TIMESTAMP_NTZ(expr)`	`CAST(expr AS TIMESTAMP)`	Rewritten to CAST
`DATE_TRUNC('part', expr)`	`DATE_TRUNC('part', expr)`	Passthrough

-- Snowflake
SELECT DATEDIFF(day, created_at, CURRENT_DATE) AS age_days,
       DATEADD(day, 30, order_date)             AS due_date
FROM orders;

-- HatiData output
SELECT DATE_DIFF('day', created_at, CURRENT_DATE) AS age_days,
       order_date + INTERVAL '30' DAY             AS due_date
FROM orders;

String Functions

Snowflake	HatiData	Notes
`SPLIT_PART(str, delim, n)`	`SPLIT_PART(str, delim, n)`	Passthrough
`REGEXP_REPLACE(str, pat, rep)`	`REGEXP_REPLACE(str, pat, rep)`	Passthrough

Aggregate Functions

Snowflake	HatiData	Notes
`LISTAGG(expr, delim)`	`STRING_AGG(expr, delim)`	Direct rename
`ARRAY_AGG(expr)`	`LIST(expr)`	Direct rename

-- Snowflake
SELECT department,
       LISTAGG(name, ', ') WITHIN GROUP (ORDER BY name) AS members,
       ARRAY_AGG(DISTINCT product_id)                   AS products
FROM employees
GROUP BY department;

-- HatiData output
SELECT department,
       STRING_AGG(name, ', ' ORDER BY name) AS members,
       LIST(DISTINCT product_id)            AS products
FROM employees
GROUP BY department;

Both LISTAGG and ARRAY_AGG forms (with and without WITHIN GROUP) are fully supported.

JSON Functions

Snowflake	HatiData	Notes
`PARSE_JSON(expr)`	`CAST(expr AS JSON)`	Rewritten to CAST
`TO_VARIANT(expr)`	`CAST(expr AS JSON)`	Rewritten to CAST
`OBJECT_CONSTRUCT(k, v, ...)`	`json_object(k, v, ...)`	Direct rename
`GET_PATH(expr, 'path')`	`json_extract(expr, 'path')`	Direct rename
`FLATTEN(input => expr)`	`UNNEST(expr)`	Extracts `input` parameter

-- Snowflake
SELECT OBJECT_CONSTRUCT('name', name, 'email', email) AS user_json,
       GET_PATH(metadata, 'preferences.theme')         AS theme,
       PARSE_JSON('{"x": 1}')                          AS data
FROM users;

-- HatiData output
SELECT json_object('name', name, 'email', email) AS user_json,
       json_extract(metadata, 'preferences.theme') AS theme,
       CAST('{"x": 1}' AS JSON)                  AS data
FROM users;

Type Conversion Functions

Snowflake	HatiData	Notes
`TRY_CAST(expr AS type)`	`TRY_CAST(expr AS type)`	Passthrough
`TO_NUMBER(expr)`	`CAST(expr AS NUMERIC)`	Manual rewrite needed
`TO_VARCHAR(expr)`	`CAST(expr AS VARCHAR)`	Manual rewrite needed

Type Mapping

Type rewrites are applied automatically during transpilation. They affect CREATE TABLE column definitions, ALTER TABLE ADD COLUMN statements, and CAST expressions.

Rewritten Types

Snowflake Type	HatiData Type	Category	Notes
`VARIANT`	`JSON`	Semi-structured	Universal semi-structured type
`OBJECT`	`JSON`	Semi-structured	Key-value object type
`ARRAY` (untyped)	`JSON`	Semi-structured	Typed `ARRAY(T)` is preserved
`NUMBER`	`NUMERIC`	Numeric	`NUMBER(p,s)` → `NUMERIC` (precision dropped)
`TIMESTAMP_NTZ`	`TIMESTAMP`	Date/Time	Timestamp without timezone
`TIMESTAMP_LTZ`	`TIMESTAMPTZ`	Date/Time	Timestamp with local timezone

-- Snowflake CREATE TABLE
CREATE TABLE events (
    id         INTEGER,
    payload    VARIANT,
    metadata   OBJECT,
    tags       ARRAY,
    amount     NUMBER,
    created_at TIMESTAMP_NTZ,
    updated_at TIMESTAMP_LTZ
);

-- HatiData output
CREATE TABLE events (
    id         INTEGER,
    payload    JSON,
    metadata   JSON,
    tags       JSON,
    amount     NUMERIC,
    created_at TIMESTAMP,
    updated_at TIMESTAMPTZ
);

Passthrough Types

These types are identical in Snowflake and HatiData and pass through unchanged:

Type	Category
`INTEGER`, `INT`, `BIGINT`, `SMALLINT`, `TINYINT`	Numeric
`FLOAT`, `DOUBLE`, `DECIMAL(p,s)`	Numeric
`VARCHAR(n)`, `CHAR`, `TEXT`	String
`BOOLEAN`	Boolean
`DATE`, `TIME`, `TIMESTAMP`	Date/Time
`BLOB`, `BYTEA`	Binary

Precision notes:

NUMBER(p,s) precision is simplified to NUMERIC (arbitrary precision). Use DECIMAL(p,s) if explicit enforcement is required.
VARCHAR(n) length constraints are preserved.
Snowflake's TIMESTAMP_NTZ(9) nanosecond precision is not carried over (microsecond precision).

SQL Constructs

These are structural transformations beyond simple function renaming, applied automatically during transpilation.

Colon Notation (JSON Field Access)

Snowflake's col:field syntax is rewritten during pre-parse normalization, before the SQL parser runs.

-- Single-level access
-- Snowflake:    SELECT payload:action FROM events;
-- HatiData:     SELECT json_extract_string(payload, 'action') FROM events;

-- Multi-level access
-- Snowflake:    SELECT payload:user:address:city FROM events;
-- HatiData:     SELECT json_extract_string(json_extract_string(json_extract_string(payload, 'user'), 'address'), 'city') FROM events;

-- With type casting
-- Snowflake:    SELECT payload:amount::FLOAT AS amount FROM events;
-- HatiData:     SELECT CAST(json_extract_string(payload, 'amount') AS FLOAT) AS amount FROM events;

The pre-parser distinguishes single colons (JSON access) from double colons (PostgreSQL-style cast). In rare cases involving complex string literals with colons, use json_extract_string() directly.

FLATTEN to UNNEST

-- Basic
-- Snowflake:
SELECT value FROM TABLE(FLATTEN(input => ARRAY_CONSTRUCT(1, 2, 3)));
-- HatiData:
SELECT value FROM UNNEST(ARRAY_CONSTRUCT(1, 2, 3));

-- With table join
-- Snowflake:
SELECT e.event_id, f.value AS item FROM events e, TABLE(FLATTEN(input => e.items)) f;
-- HatiData:
SELECT e.event_id, f.value AS item FROM events e, UNNEST(e.items) AS f;

FLATTEN parameters path, outer, recursive, and mode are not supported. Pre-extract with GET_PATH and use LEFT JOIN UNNEST for outer behavior.

OBJECT_CONSTRUCT to json_object

-- Snowflake:
SELECT OBJECT_CONSTRUCT('name', name, 'id', id) AS payload FROM users;
-- HatiData:
SELECT json_object('name', name, 'id', id) AS payload FROM users;

GET_PATH to json_extract

-- Snowflake:
SELECT GET_PATH(payload, 'user.preferences.theme') AS theme FROM events;
-- HatiData:
SELECT json_extract(payload, 'user.preferences.theme') AS theme FROM events;

Note: For complex paths with array indexing, JSON Pointer syntax may differ from Snowflake's dot notation.

CLONE — Manual Conversion Required

Snowflake's zero-copy CLONE is not supported. The transpiler logs a warning; manual conversion is needed:

-- Snowflake (NOT SUPPORTED):
CREATE TABLE orders_backup CLONE orders;

-- HatiData equivalent (full data copy):
CREATE TABLE orders_backup AS SELECT * FROM orders;

Supported SQL Features

Feature	Support
`SELECT` with subqueries	Full
Common Table Expressions (CTEs)	Full
Window functions (`ROW_NUMBER`, `RANK`, `LAG`, `LEAD`, etc.)	Full
`QUALIFY` clause	Full (native)
`LATERAL` joins	Full (native)
`INSERT INTO ... VALUES`	Full
`INSERT INTO ... SELECT`	Full
`CREATE TABLE`	Full (with type mapping)
`CREATE TABLE AS SELECT`	Full
`ALTER TABLE ADD COLUMN`	Full (with type mapping)
`DROP TABLE`	Full
`UPDATE`	Full
`DELETE`	Full
`EXPLAIN`	Full
`TABLESAMPLE` / `SAMPLE`	Supported with syntax differences

QUALIFY Example

-- Works in both Snowflake and HatiData
SELECT customer_id, order_date,
       ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
FROM orders
QUALIFY rn = 1;

LATERAL Join Example

-- Works in both Snowflake and HatiData
SELECT c.customer_id, c.name, recent.order_date, recent.total
FROM customers c,
     LATERAL (
       SELECT order_date, total FROM orders o
       WHERE o.customer_id = c.customer_id
       ORDER BY order_date DESC LIMIT 3
     ) recent;

Unsupported Constructs

The following Snowflake constructs are rejected with a TranspilerError::UnsupportedConstruct error:

Construct	Reason	Alternative
`COPY INTO`	Snowflake-specific data loading	`COPY my_table FROM 'file.csv'` or `read_parquet()`
Staged file refs (`@stage`, `$1`)	Snowflake stages not supported	`read_csv()`, `read_parquet()`, `read_json()`
`STREAMS`	Change data capture (CDC)	Use audit log queries
`TASKS`	Scheduled SQL execution	Use cron, Airflow, or Prefect
`PIPES`	Continuous data loading	`INSERT INTO ... SELECT` from file functions

Hybrid SQL Functions

HatiData extends standard SQL with vector-aware functions for semantic search over structured data and agent memory. These are resolved at query time: the proxy embeds the search text and evaluates semantic similarity against the vector index.

semantic_match()

Returns TRUE if cosine similarity exceeds a threshold. Use in WHERE and JOIN ON clauses.

-- Filter rows semantically
SELECT ticket_id, subject, description
FROM support_tickets
WHERE semantic_match(description, 'billing dispute refund request', 0.8)
  AND status = 'open';

-- In a JOIN ON condition
SELECT o.order_id, m.content AS related_memory
FROM orders o
JOIN _hatidata_agent_memory m
  ON semantic_match(m.content, CAST(o.description AS VARCHAR), 0.75);

Signature: semantic_match(column, 'search text' [, threshold FLOAT = 0.7])

vector_match() is an alias for semantic_match.

semantic_rank()

Returns the cosine similarity score as a FLOAT (0.0 to 1.0). Use in SELECT and ORDER BY to rank by relevance.

SELECT content,
       semantic_rank(content, 'revenue growth Q4') AS relevance
FROM _hatidata_agent_memory
WHERE agent_id = 'analyst'
ORDER BY relevance DESC
LIMIT 10;

JOIN_VECTOR

Syntactic sugar combining a JOIN with semantic_match for readable intent:

SELECT t.ticket_id, t.subject,
       m.content AS related_memory,
       semantic_rank(m.content, t.subject) AS relevance
FROM support_tickets t
JOIN_VECTOR _hatidata_agent_memory m
  ON semantic_match(m.content, t.subject, 0.7)
WHERE t.status = 'open'
ORDER BY relevance DESC;

Threshold Tuning

Threshold	Precision	Recall	Best For
0.9+	Very high	Low	Exact semantic match, deduplication
0.8	High	Medium	Targeted retrieval, specific facts
0.7 (default)	Medium	Medium-High	General search, RAG context
0.6	Low-Medium	High	Exploratory search, broad context
0.5	Low	Very high	Catch-all, brainstorming

Vector functions require an embedding provider configured on the proxy. Without one, queries using these functions return an error.

Transpilation Cache and AI Healing

Transpilation results are cached by SQL hash. Cache hits skip the entire parse-and-rewrite pipeline — latency drops to sub-millisecond for repeated queries.

If a transpiled query fails at execution, AI healing can call an external AI API with the error and original SQL to suggest a corrected query for automatic retry. AI healing is disabled by default; enable it with the HATIDATA_HEALER_API_ENDPOINT environment variable.

Function Mapping​

Conditional Functions​

Date and Time Functions​

String Functions​

Aggregate Functions​

JSON Functions​

Type Conversion Functions​

Type Mapping​

Rewritten Types​

Passthrough Types​

SQL Constructs​

Colon Notation (JSON Field Access)​

FLATTEN to UNNEST​

OBJECT_CONSTRUCT to json_object​

GET_PATH to json_extract​

CLONE — Manual Conversion Required​

Supported SQL Features​

QUALIFY Example​

LATERAL Join Example​

Unsupported Constructs​

Hybrid SQL Functions​

semantic_match()​

semantic_rank()​

JOIN_VECTOR​

Threshold Tuning​

Transpilation Cache and AI Healing​

Stay in the loop