ANDI vs Vector-Only Databases
Vector databases solve one problem well: storing embeddings and returning approximate nearest neighbors. But AI agents need more than similarity search. They need structured data alongside vectors, reasoning audit trails, state isolation, access control, and billing -- all in a single system.
HatiData's Agent-Native Data Infrastructure (ANDI) combines SQL and vector search in one engine, so agents do not need a separate vector database plus a separate relational database plus a separate audit system. This page compares ANDI against popular vector-only databases on the capabilities that matter for production agent systems.
Feature Comparison
| Capability | HatiData (ANDI) | Pinecone | Weaviate | Qdrant | ChromaDB |
|---|---|---|---|---|---|
| Query language | Full SQL (Postgres wire protocol) | REST/gRPC API | GraphQL + REST | REST/gRPC API | Python/JS client |
| Structured data | Native SQL tables, joins, aggregations | Metadata key-value only | Properties (typed) | Payload key-value | Metadata dict |
| Vector search | semantic_match(), semantic_rank(), JOIN_VECTOR in SQL | ANN (cosine, dot, euclidean) | ANN (cosine, dot, L2, hamming) | ANN (cosine, dot, euclidean) | ANN (cosine, L2, IP) |
| Hybrid search | SQL WHERE + vector similarity in one query | Metadata filters + ANN | BM25 + ANN (hybrid) | Payload filters + ANN | WHERE filters + ANN |
| Agent memory | Built-in: SQL + vector, access tracking, TTL, per-agent isolation | Manual: store vectors, build memory layer yourself | Manual | Manual | Manual |
| Chain-of-thought logging | Built-in: immutable, hash-chained, append-only ledger | Not available | Not available | Not available | Not available |
| Semantic triggers | Built-in: ANN pre-filter + cosine verification, webhook/notify dispatch | Not available | Not available | Not available | Not available |
| Branch isolation | Built-in: schema-based branches, merge strategies, garbage collection | Not available | Not available | Not available | Not available |
| RBAC / row-level policies | Per-agent, per-org, column masking, row filtering | API key scoping | API key + OIDC | API key + JWT | None |
| Audit trail | Every query logged with agent ID, cost, policy decisions | API logs only | API logs only | API logs only | None |
| Per-agent billing | Native metering, Stripe integration, quota enforcement | Flat pricing per index | Flat pricing per cluster | Flat pricing per cluster | Free / self-hosted |
| Wire protocol | Postgres (any SQL client, BI tool, ORM) | REST | REST / GraphQL | REST / gRPC | HTTP |
| Deployment | Self-hosted (VPC), managed cloud, multi-cloud | Managed only | Self-hosted or managed | Self-hosted or managed | Self-hosted (embedded) |
When to Use a Vector-Only Database
Vector-only databases are a good fit when:
- You only need similarity search. Your application embeds documents and retrieves the top-K nearest neighbors. No joins, no aggregations, no complex filtering.
- You already have a separate relational database. Your structured data lives in Postgres or MySQL, and you are comfortable maintaining two systems with application-level consistency.
- You do not need agent-specific features. No CoT audit trails, no branch isolation, no semantic triggers, no per-agent billing.
- You want a managed vector index with minimal setup. Pinecone and Weaviate Cloud offer turnkey hosted vector search.
When to Use ANDI
ANDI is the better choice when:
- Your agents need both SQL and vector search. Querying structured business data and semantic similarity in the same connection, without glue code.
- You need audit trails for compliance. Financial services, healthcare, legal -- any domain where you must prove why an agent made a decision.
- You want branch isolation for safe exploration. Agents can modify data in branches without affecting production, then merge or discard.
- You need per-agent governance. Row-level policies, column masking, quota limits, and billing that vary by agent identity.
- You are building a multi-agent system. Shared memory, semantic triggers between agents, and CoT replay across the fleet.
Code Comparison: The Same Task, Two Approaches
Task: Store customer support interactions as memories, then find similar past interactions for a new ticket, filtered to the same product category.
Approach A: Pinecone + Postgres (Two Systems)
import pinecone
import psycopg2
from openai import OpenAI
# --- System 1: Postgres for structured data ---
pg = psycopg2.connect("postgresql://user:pass@postgres-host:5432/support")
pg_cursor = pg.cursor()
# --- System 2: Pinecone for vectors ---
pc = pinecone.Pinecone(api_key="pk-...")
index = pc.Index("support-memories")
openai_client = OpenAI()
# Store a new interaction (must write to BOTH systems)
def store_interaction(ticket_id, agent_id, content, category, resolution):
# Write structured data to Postgres
pg_cursor.execute(
"""INSERT INTO interactions (ticket_id, agent_id, content, category, resolution, created_at)
VALUES (%s, %s, %s, %s, %s, NOW())""",
(ticket_id, agent_id, content, category, resolution),
)
pg.commit()
# Generate embedding and write to Pinecone
embedding = openai_client.embeddings.create(
model="text-embedding-3-small", input=content
).data[0].embedding
index.upsert(vectors=[{
"id": ticket_id,
"values": embedding,
"metadata": {"category": category, "agent_id": agent_id},
}])
# Search for similar interactions (must query BOTH systems and join manually)
def find_similar(query_text, category, limit=5):
# Get embedding for the query
query_vec = openai_client.embeddings.create(
model="text-embedding-3-small", input=query_text
).data[0].embedding
# Search Pinecone with metadata filter
results = index.query(
vector=query_vec,
top_k=limit,
filter={"category": {"$eq": category}},
include_metadata=True,
)
# Fetch full records from Postgres (application-level join)
ticket_ids = [m.id for m in results.matches]
if not ticket_ids:
return []
pg_cursor.execute(
"SELECT * FROM interactions WHERE ticket_id = ANY(%s)",
(ticket_ids,),
)
rows = pg_cursor.fetchall()
# Manually merge similarity scores with structured data
score_map = {m.id: m.score for m in results.matches}
enriched = []
for row in rows:
enriched.append({**dict(row), "similarity": score_map.get(row[0], 0)})
return sorted(enriched, key=lambda x: x["similarity"], reverse=True)
Issues with this approach:
- Two connections to maintain (Postgres + Pinecone)
- Application-level consistency (what if one write fails?)
- Manual embedding generation
- Manual join between vector results and structured data
- No audit trail of which agent queried what
- No branch isolation, no CoT logging, no triggers
Approach B: HatiData (Single System)
from hatidata import HatiDataClient
client = HatiDataClient(
host="localhost",
port=5439,
api_key="hd_live_your_api_key",
)
# Store a new interaction (single write, embedding is automatic)
def store_interaction(ticket_id, agent_id, content, category, resolution):
client.memory.store(
agent_id=agent_id,
content=content,
metadata={
"ticket_id": ticket_id,
"category": category,
"resolution": resolution,
},
)
# Search with SQL + vector in one query (no manual join, no separate embedding call)
def find_similar(query_text, category, limit=5):
return client.query(f"""
SELECT
m.content,
m.metadata->>'ticket_id' AS ticket_id,
m.metadata->>'resolution' AS resolution,
m.metadata->>'category' AS category,
semantic_match(m.content, '{query_text}') AS similarity
FROM _hatidata_agent_memory m
WHERE m.metadata->>'category' = '{category}'
ORDER BY semantic_rank(m.content, '{query_text}')
LIMIT {limit}
""")
What you get with ANDI:
- Single connection (Postgres wire protocol)
- Automatic embedding (built-in bge-small-en-v1.5)
- SQL + vector in one query -- no application-level join
- Every query is audit-logged with agent identity
- Add CoT logging, triggers, or branches without changing the data layer
Migration Path
If you are currently using a vector-only database and want to move to ANDI, see the dedicated migration guides:
These guides cover exporting your existing vectors, importing them as HatiData memories, and rewriting your queries.
Related Concepts
- What is ANDI? -- Overview of the Agent-Native Data Infrastructure
- Persistent Memory -- How agent memory works
- Hybrid SQL -- SQL + vector search in one engine
- Embedding Pipeline -- Automatic embedding architecture
- When to Use HatiData -- Decision guide