Skip to main content

Hybrid SQL

Hybrid SQL lets agents combine traditional SQL queries with semantic vector search in a single statement — filtering by structured columns and ranking by meaning simultaneously.

Why Agents Need Hybrid SQL

Agents often need to answer questions that span structured data and unstructured meaning:

  • "Find open support tickets about billing disputes" — structured filter (status = 'open') + semantic match (billing disputes)
  • "Which knowledge base articles are relevant to this customer's question?" — semantic join between two tables
  • "Show me the top 10 most similar past cases to this one" — ranking by vector similarity

Traditional SQL handles the structured part. Vector databases handle the semantic part. Hybrid SQL handles both in a single query.

semantic_match()

The semantic_match() function filters rows by semantic similarity to a natural language query:

SELECT ticket_id, subject, body
FROM support_tickets
WHERE semantic_match(embedding, 'billing dispute refund')
ORDER BY semantic_rank(embedding, 'billing dispute refund') DESC
LIMIT 10

How It Works

  1. The query text ('billing dispute refund') is embedded using the configured embedding provider
  2. An approximate nearest-neighbor (ANN) search finds candidate rows
  3. Exact cosine similarity is computed for ranking
  4. Results are filtered by the similarity threshold (configurable, default 0.7)

Threshold Guide

ThresholdBehaviorUse Case
0.9+Very strict — only near-exact semantic matchesDeduplication, exact concept matching
0.8 - 0.9Strict — strong conceptual similarityCompliance checks, PII detection
0.7 - 0.8Moderate — related concepts includedGeneral search, RAG retrieval
0.5 - 0.7Loose — broad topic matchingExploration, discovery
< 0.5Very loose — tangentially relatedNot recommended for most use cases

Override the default threshold per query:

SELECT * FROM documents
WHERE semantic_match(embedding, 'financial risk assessment', threshold := 0.85)

JOIN_VECTOR

JOIN_VECTOR enriches one table with semantically similar rows from another:

SELECT t.ticket_id, t.subject, k.article_title, k.solution
FROM support_tickets t
JOIN_VECTOR knowledge_base k ON semantic_match(k.embedding, t.subject)
WHERE t.status = 'open'
LIMIT 5

How It Works

  1. For each row in the left table, the join key column (e.g., t.subject) is embedded
  2. ANN search finds the top-K matches in the right table's embedding column
  3. Results are joined and returned with similarity scores

Top-K Control

By default, JOIN_VECTOR returns the top 3 matches per left-side row. Override with the top_k parameter:

SELECT t.ticket_id, k.article_title, k.similarity_score
FROM support_tickets t
JOIN_VECTOR knowledge_base k ON semantic_match(k.embedding, t.subject, top_k := 5)
WHERE t.priority = 'high'

semantic_rank()

Use semantic_rank() in ORDER BY to sort results by similarity score:

SELECT memory_id, content, semantic_rank(embedding, 'revenue growth Q4') as relevance
FROM _hatidata_agent_memory
WHERE memory_type = 'fact'
ORDER BY relevance DESC
LIMIT 20

Combining with Standard SQL

Hybrid SQL composes naturally with all standard SQL features:

-- Hybrid + aggregation
SELECT department,
COUNT(*) as ticket_count,
AVG(semantic_rank(embedding, 'customer complaint')) as avg_relevance
FROM support_tickets
WHERE semantic_match(embedding, 'customer complaint')
GROUP BY department
ORDER BY avg_relevance DESC

-- Hybrid + CTE
WITH relevant_docs AS (
SELECT doc_id, title, content
FROM knowledge_base
WHERE semantic_match(embedding, 'GDPR data deletion requirements')
)
SELECT d.title, d.content
FROM relevant_docs d
WHERE d.content LIKE '%right to erasure%'

-- Hybrid + window functions
SELECT ticket_id, subject,
semantic_rank(embedding, 'billing') as score,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY semantic_rank(embedding, 'billing') DESC) as rank
FROM support_tickets
WHERE semantic_match(embedding, 'billing')

Configuration

Hybrid SQL behavior is configurable per deployment:

SettingDefaultDescription
Hybrid SQL enabledtrueEnable/disable hybrid SQL
Default similarity threshold0.7Default semantic_match threshold
Default JOIN_VECTOR top-K3Default top-K matches for JOIN_VECTOR
Embedding modeltext-embedding-3-smallDefault embedding model

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.