Hybrid SQL

Hybrid SQL lets agents combine traditional SQL queries with semantic vector search in a single statement — filtering by structured columns and ranking by meaning simultaneously.

Why Agents Need Hybrid SQL

Agents often need to answer questions that span structured data and unstructured meaning:

"Find open support tickets about billing disputes" — structured filter (status = 'open') + semantic match (billing disputes)
"Which knowledge base articles are relevant to this customer's question?" — semantic join between two tables
"Show me the top 10 most similar past cases to this one" — ranking by vector similarity

Traditional SQL handles the structured part. Vector databases handle the semantic part. Hybrid SQL handles both in a single query.

`semantic_match()`

The semantic_match() function filters rows by semantic similarity to a natural language query:

SELECT ticket_id, subject, body
FROM support_tickets
WHERE semantic_match(embedding, 'billing dispute refund')
ORDER BY semantic_rank(embedding, 'billing dispute refund') DESC
LIMIT 10

How It Works

The query text ('billing dispute refund') is embedded using the configured embedding provider
An approximate nearest-neighbor (ANN) search finds candidate rows
Exact cosine similarity is computed for ranking
Results are filtered by the similarity threshold (configurable, default 0.7)

Threshold Guide

Threshold	Behavior	Use Case
`0.9+`	Very strict — only near-exact semantic matches	Deduplication, exact concept matching
`0.8 - 0.9`	Strict — strong conceptual similarity	Compliance checks, PII detection
`0.7 - 0.8`	Moderate — related concepts included	General search, RAG retrieval
`0.5 - 0.7`	Loose — broad topic matching	Exploration, discovery
`< 0.5`	Very loose — tangentially related	Not recommended for most use cases

Override the default threshold per query:

SELECT * FROM documents
WHERE semantic_match(embedding, 'financial risk assessment', threshold := 0.85)

`JOIN_VECTOR`

JOIN_VECTOR enriches one table with semantically similar rows from another:

SELECT t.ticket_id, t.subject, k.article_title, k.solution
FROM support_tickets t
JOIN_VECTOR knowledge_base k ON semantic_match(k.embedding, t.subject)
WHERE t.status = 'open'
LIMIT 5

How It Works

For each row in the left table, the join key column (e.g., t.subject) is embedded
ANN search finds the top-K matches in the right table's embedding column
Results are joined and returned with similarity scores

Top-K Control

By default, JOIN_VECTOR returns the top 3 matches per left-side row. Override with the top_k parameter:

SELECT t.ticket_id, k.article_title, k.similarity_score
FROM support_tickets t
JOIN_VECTOR knowledge_base k ON semantic_match(k.embedding, t.subject, top_k := 5)
WHERE t.priority = 'high'

`semantic_rank()`

Use semantic_rank() in ORDER BY to sort results by similarity score:

SELECT memory_id, content, semantic_rank(embedding, 'revenue growth Q4') as relevance
FROM _hatidata_agent_memory
WHERE memory_type = 'fact'
ORDER BY relevance DESC
LIMIT 20

Combining with Standard SQL

Hybrid SQL composes naturally with all standard SQL features:

-- Hybrid + aggregation
SELECT department,
       COUNT(*) as ticket_count,
       AVG(semantic_rank(embedding, 'customer complaint')) as avg_relevance
FROM support_tickets
WHERE semantic_match(embedding, 'customer complaint')
GROUP BY department
ORDER BY avg_relevance DESC

-- Hybrid + CTE
WITH relevant_docs AS (
    SELECT doc_id, title, content
    FROM knowledge_base
    WHERE semantic_match(embedding, 'GDPR data deletion requirements')
)
SELECT d.title, d.content
FROM relevant_docs d
WHERE d.content LIKE '%right to erasure%'

-- Hybrid + window functions
SELECT ticket_id, subject,
       semantic_rank(embedding, 'billing') as score,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY semantic_rank(embedding, 'billing') DESC) as rank
FROM support_tickets
WHERE semantic_match(embedding, 'billing')

Configuration

Hybrid SQL behavior is configurable per deployment:

Setting	Default	Description
Hybrid SQL enabled	`true`	Enable/disable hybrid SQL
Default similarity threshold	0.7	Default `semantic_match` threshold
Default JOIN_VECTOR top-K	3	Default top-K matches for `JOIN_VECTOR`
Embedding model	`text-embedding-3-small`	Default embedding model

Persistent Memory — Memories searchable via hybrid SQL
SQL Functions — Full SQL compatibility reference
Semantic Triggers — Triggers use the same embedding infrastructure

Why Agents Need Hybrid SQL​

semantic_match()​

How It Works​

Threshold Guide​

JOIN_VECTOR​

How It Works​

Top-K Control​

semantic_rank()​

Combining with Standard SQL​

Configuration​

Related Concepts​

Stay in the loop

Why Agents Need Hybrid SQL

`semantic_match()`

How It Works

Threshold Guide

`JOIN_VECTOR`

How It Works

Top-K Control

`semantic_rank()`

Combining with Standard SQL

Configuration

Related Concepts