Hybrid SQL
Hybrid SQL lets agents combine traditional SQL queries with semantic vector search in a single statement — filtering by structured columns and ranking by meaning simultaneously.
Why Agents Need Hybrid SQL
Agents often need to answer questions that span structured data and unstructured meaning:
- "Find open support tickets about billing disputes" — structured filter (
status = 'open') + semantic match (billing disputes) - "Which knowledge base articles are relevant to this customer's question?" — semantic join between two tables
- "Show me the top 10 most similar past cases to this one" — ranking by vector similarity
Traditional SQL handles the structured part. Vector databases handle the semantic part. Hybrid SQL handles both in a single query.
semantic_match()
The semantic_match() function filters rows by semantic similarity to a natural language query:
SELECT ticket_id, subject, body
FROM support_tickets
WHERE semantic_match(embedding, 'billing dispute refund')
ORDER BY semantic_rank(embedding, 'billing dispute refund') DESC
LIMIT 10
How It Works
- The query text (
'billing dispute refund') is embedded using the configured embedding provider - An approximate nearest-neighbor (ANN) search finds candidate rows
- Exact cosine similarity is computed for ranking
- Results are filtered by the similarity threshold (configurable, default 0.7)
Threshold Guide
| Threshold | Behavior | Use Case |
|---|---|---|
0.9+ | Very strict — only near-exact semantic matches | Deduplication, exact concept matching |
0.8 - 0.9 | Strict — strong conceptual similarity | Compliance checks, PII detection |
0.7 - 0.8 | Moderate — related concepts included | General search, RAG retrieval |
0.5 - 0.7 | Loose — broad topic matching | Exploration, discovery |
< 0.5 | Very loose — tangentially related | Not recommended for most use cases |
Override the default threshold per query:
SELECT * FROM documents
WHERE semantic_match(embedding, 'financial risk assessment', threshold := 0.85)
JOIN_VECTOR
JOIN_VECTOR enriches one table with semantically similar rows from another:
SELECT t.ticket_id, t.subject, k.article_title, k.solution
FROM support_tickets t
JOIN_VECTOR knowledge_base k ON semantic_match(k.embedding, t.subject)
WHERE t.status = 'open'
LIMIT 5
How It Works
- For each row in the left table, the join key column (e.g.,
t.subject) is embedded - ANN search finds the top-K matches in the right table's embedding column
- Results are joined and returned with similarity scores
Top-K Control
By default, JOIN_VECTOR returns the top 3 matches per left-side row. Override with the top_k parameter:
SELECT t.ticket_id, k.article_title, k.similarity_score
FROM support_tickets t
JOIN_VECTOR knowledge_base k ON semantic_match(k.embedding, t.subject, top_k := 5)
WHERE t.priority = 'high'
semantic_rank()
Use semantic_rank() in ORDER BY to sort results by similarity score:
SELECT memory_id, content, semantic_rank(embedding, 'revenue growth Q4') as relevance
FROM _hatidata_agent_memory
WHERE memory_type = 'fact'
ORDER BY relevance DESC
LIMIT 20
Combining with Standard SQL
Hybrid SQL composes naturally with all standard SQL features:
-- Hybrid + aggregation
SELECT department,
COUNT(*) as ticket_count,
AVG(semantic_rank(embedding, 'customer complaint')) as avg_relevance
FROM support_tickets
WHERE semantic_match(embedding, 'customer complaint')
GROUP BY department
ORDER BY avg_relevance DESC
-- Hybrid + CTE
WITH relevant_docs AS (
SELECT doc_id, title, content
FROM knowledge_base
WHERE semantic_match(embedding, 'GDPR data deletion requirements')
)
SELECT d.title, d.content
FROM relevant_docs d
WHERE d.content LIKE '%right to erasure%'
-- Hybrid + window functions
SELECT ticket_id, subject,
semantic_rank(embedding, 'billing') as score,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY semantic_rank(embedding, 'billing') DESC) as rank
FROM support_tickets
WHERE semantic_match(embedding, 'billing')
Configuration
Hybrid SQL behavior is configurable per deployment:
| Setting | Default | Description |
|---|---|---|
| Hybrid SQL enabled | true | Enable/disable hybrid SQL |
| Default similarity threshold | 0.7 | Default semantic_match threshold |
| Default JOIN_VECTOR top-K | 3 | Default top-K matches for JOIN_VECTOR |
| Embedding model | text-embedding-3-small | Default embedding model |
Related Concepts
- Persistent Memory — Memories searchable via hybrid SQL
- SQL Functions — Full SQL compatibility reference
- Semantic Triggers — Triggers use the same embedding infrastructure