Skip to main content

ANDI vs Vector-Only Databases

Vector databases solve one problem well: storing embeddings and returning approximate nearest neighbors. But AI agents need more than similarity search. They need structured data alongside vectors, reasoning audit trails, state isolation, access control, and billing -- all in a single system.

HatiData's Agent-Native Data Infrastructure (ANDI) combines SQL and vector search in one engine, so agents do not need a separate vector database plus a separate relational database plus a separate audit system. This page compares ANDI against popular vector-only databases on the capabilities that matter for production agent systems.


Feature Comparison

CapabilityHatiData (ANDI)PineconeWeaviateQdrantChromaDB
Query languageFull SQL (Postgres wire protocol)REST/gRPC APIGraphQL + RESTREST/gRPC APIPython/JS client
Structured dataNative SQL tables, joins, aggregationsMetadata key-value onlyProperties (typed)Payload key-valueMetadata dict
Vector searchsemantic_match(), semantic_rank(), JOIN_VECTOR in SQLANN (cosine, dot, euclidean)ANN (cosine, dot, L2, hamming)ANN (cosine, dot, euclidean)ANN (cosine, L2, IP)
Hybrid searchSQL WHERE + vector similarity in one queryMetadata filters + ANNBM25 + ANN (hybrid)Payload filters + ANNWHERE filters + ANN
Agent memoryBuilt-in: SQL + vector, access tracking, TTL, per-agent isolationManual: store vectors, build memory layer yourselfManualManualManual
Chain-of-thought loggingBuilt-in: immutable, hash-chained, append-only ledgerNot availableNot availableNot availableNot available
Semantic triggersBuilt-in: ANN pre-filter + cosine verification, webhook/notify dispatchNot availableNot availableNot availableNot available
Branch isolationBuilt-in: schema-based branches, merge strategies, garbage collectionNot availableNot availableNot availableNot available
RBAC / row-level policiesPer-agent, per-org, column masking, row filteringAPI key scopingAPI key + OIDCAPI key + JWTNone
Audit trailEvery query logged with agent ID, cost, policy decisionsAPI logs onlyAPI logs onlyAPI logs onlyNone
Per-agent billingNative metering, Stripe integration, quota enforcementFlat pricing per indexFlat pricing per clusterFlat pricing per clusterFree / self-hosted
Wire protocolPostgres (any SQL client, BI tool, ORM)RESTREST / GraphQLREST / gRPCHTTP
DeploymentSelf-hosted (VPC), managed cloud, multi-cloudManaged onlySelf-hosted or managedSelf-hosted or managedSelf-hosted (embedded)

When to Use a Vector-Only Database

Vector-only databases are a good fit when:

  • You only need similarity search. Your application embeds documents and retrieves the top-K nearest neighbors. No joins, no aggregations, no complex filtering.
  • You already have a separate relational database. Your structured data lives in Postgres or MySQL, and you are comfortable maintaining two systems with application-level consistency.
  • You do not need agent-specific features. No CoT audit trails, no branch isolation, no semantic triggers, no per-agent billing.
  • You want a managed vector index with minimal setup. Pinecone and Weaviate Cloud offer turnkey hosted vector search.

When to Use ANDI

ANDI is the better choice when:

  • Your agents need both SQL and vector search. Querying structured business data and semantic similarity in the same connection, without glue code.
  • You need audit trails for compliance. Financial services, healthcare, legal -- any domain where you must prove why an agent made a decision.
  • You want branch isolation for safe exploration. Agents can modify data in branches without affecting production, then merge or discard.
  • You need per-agent governance. Row-level policies, column masking, quota limits, and billing that vary by agent identity.
  • You are building a multi-agent system. Shared memory, semantic triggers between agents, and CoT replay across the fleet.

Code Comparison: The Same Task, Two Approaches

Task: Store customer support interactions as memories, then find similar past interactions for a new ticket, filtered to the same product category.

Approach A: Pinecone + Postgres (Two Systems)

import pinecone
import psycopg2
from openai import OpenAI

# --- System 1: Postgres for structured data ---
pg = psycopg2.connect("postgresql://user:pass@postgres-host:5432/support")
pg_cursor = pg.cursor()

# --- System 2: Pinecone for vectors ---
pc = pinecone.Pinecone(api_key="pk-...")
index = pc.Index("support-memories")

openai_client = OpenAI()

# Store a new interaction (must write to BOTH systems)
def store_interaction(ticket_id, agent_id, content, category, resolution):
# Write structured data to Postgres
pg_cursor.execute(
"""INSERT INTO interactions (ticket_id, agent_id, content, category, resolution, created_at)
VALUES (%s, %s, %s, %s, %s, NOW())""",
(ticket_id, agent_id, content, category, resolution),
)
pg.commit()

# Generate embedding and write to Pinecone
embedding = openai_client.embeddings.create(
model="text-embedding-3-small", input=content
).data[0].embedding

index.upsert(vectors=[{
"id": ticket_id,
"values": embedding,
"metadata": {"category": category, "agent_id": agent_id},
}])


# Search for similar interactions (must query BOTH systems and join manually)
def find_similar(query_text, category, limit=5):
# Get embedding for the query
query_vec = openai_client.embeddings.create(
model="text-embedding-3-small", input=query_text
).data[0].embedding

# Search Pinecone with metadata filter
results = index.query(
vector=query_vec,
top_k=limit,
filter={"category": {"$eq": category}},
include_metadata=True,
)

# Fetch full records from Postgres (application-level join)
ticket_ids = [m.id for m in results.matches]
if not ticket_ids:
return []

pg_cursor.execute(
"SELECT * FROM interactions WHERE ticket_id = ANY(%s)",
(ticket_ids,),
)
rows = pg_cursor.fetchall()

# Manually merge similarity scores with structured data
score_map = {m.id: m.score for m in results.matches}
enriched = []
for row in rows:
enriched.append({**dict(row), "similarity": score_map.get(row[0], 0)})

return sorted(enriched, key=lambda x: x["similarity"], reverse=True)

Issues with this approach:

  • Two connections to maintain (Postgres + Pinecone)
  • Application-level consistency (what if one write fails?)
  • Manual embedding generation
  • Manual join between vector results and structured data
  • No audit trail of which agent queried what
  • No branch isolation, no CoT logging, no triggers

Approach B: HatiData (Single System)

from hatidata import HatiDataClient

client = HatiDataClient(
host="localhost",
port=5439,
api_key="hd_live_your_api_key",
)

# Store a new interaction (single write, embedding is automatic)
def store_interaction(ticket_id, agent_id, content, category, resolution):
client.memory.store(
agent_id=agent_id,
content=content,
metadata={
"ticket_id": ticket_id,
"category": category,
"resolution": resolution,
},
)


# Search with SQL + vector in one query (no manual join, no separate embedding call)
def find_similar(query_text, category, limit=5):
return client.query(f"""
SELECT
m.content,
m.metadata->>'ticket_id' AS ticket_id,
m.metadata->>'resolution' AS resolution,
m.metadata->>'category' AS category,
semantic_match(m.content, '{query_text}') AS similarity
FROM _hatidata_agent_memory m
WHERE m.metadata->>'category' = '{category}'
ORDER BY semantic_rank(m.content, '{query_text}')
LIMIT {limit}
""")

What you get with ANDI:

  • Single connection (Postgres wire protocol)
  • Automatic embedding (built-in bge-small-en-v1.5)
  • SQL + vector in one query -- no application-level join
  • Every query is audit-logged with agent identity
  • Add CoT logging, triggers, or branches without changing the data layer

Migration Path

If you are currently using a vector-only database and want to move to ANDI, see the dedicated migration guides:

These guides cover exporting your existing vectors, importing them as HatiData memories, and rewriting your queries.


Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.