ANDI vs Vector-Only Databases

Vector databases solve one problem well: storing embeddings and returning approximate nearest neighbors. But AI agents need more than similarity search. They need structured data alongside vectors, reasoning audit trails, state isolation, access control, and billing -- all in a single system.

HatiData's Agent-Native Data Infrastructure (ANDI) combines SQL and vector search in one engine, so agents do not need a separate vector database plus a separate relational database plus a separate audit system. This page compares ANDI against popular vector-only databases on the capabilities that matter for production agent systems.

Feature Comparison

Capability	HatiData (ANDI)	Pinecone	Weaviate	Qdrant	ChromaDB
Query language	Full SQL (Postgres wire protocol)	REST/gRPC API	GraphQL + REST	REST/gRPC API	Python/JS client
Structured data	Native SQL tables, joins, aggregations	Metadata key-value only	Properties (typed)	Payload key-value	Metadata dict
Vector search	`semantic_match()`, `semantic_rank()`, `JOIN_VECTOR` in SQL	ANN (cosine, dot, euclidean)	ANN (cosine, dot, L2, hamming)	ANN (cosine, dot, euclidean)	ANN (cosine, L2, IP)
Hybrid search	SQL WHERE + vector similarity in one query	Metadata filters + ANN	BM25 + ANN (hybrid)	Payload filters + ANN	WHERE filters + ANN
Agent memory	Built-in: SQL + vector, access tracking, TTL, per-agent isolation	Manual: store vectors, build memory layer yourself	Manual	Manual	Manual
Chain-of-thought logging	Built-in: immutable, hash-chained, append-only ledger	Not available	Not available	Not available	Not available
Semantic triggers	Built-in: ANN pre-filter + cosine verification, webhook/notify dispatch	Not available	Not available	Not available	Not available
Branch isolation	Built-in: schema-based branches, merge strategies, garbage collection	Not available	Not available	Not available	Not available
RBAC / row-level policies	Per-agent, per-org, column masking, row filtering	API key scoping	API key + OIDC	API key + JWT	None
Audit trail	Every query logged with agent ID, cost, policy decisions	API logs only	API logs only	API logs only	None
Per-agent billing	Native metering, Stripe integration, quota enforcement	Flat pricing per index	Flat pricing per cluster	Flat pricing per cluster	Free / self-hosted
Wire protocol	Postgres (any SQL client, BI tool, ORM)	REST	REST / GraphQL	REST / gRPC	HTTP
Deployment	Self-hosted (VPC), managed cloud, multi-cloud	Managed only	Self-hosted or managed	Self-hosted or managed	Self-hosted (embedded)

When to Use a Vector-Only Database

Vector-only databases are a good fit when:

You only need similarity search. Your application embeds documents and retrieves the top-K nearest neighbors. No joins, no aggregations, no complex filtering.
You already have a separate relational database. Your structured data lives in Postgres or MySQL, and you are comfortable maintaining two systems with application-level consistency.
You do not need agent-specific features. No CoT audit trails, no branch isolation, no semantic triggers, no per-agent billing.
You want a managed vector index with minimal setup. Pinecone and Weaviate Cloud offer turnkey hosted vector search.

When to Use ANDI

ANDI is the better choice when:

Your agents need both SQL and vector search. Querying structured business data and semantic similarity in the same connection, without glue code.
You need audit trails for compliance. Financial services, healthcare, legal -- any domain where you must prove why an agent made a decision.
You want branch isolation for safe exploration. Agents can modify data in branches without affecting production, then merge or discard.
You need per-agent governance. Row-level policies, column masking, quota limits, and billing that vary by agent identity.
You are building a multi-agent system. Shared memory, semantic triggers between agents, and CoT replay across the fleet.

Code Comparison: The Same Task, Two Approaches

Task: Store customer support interactions as memories, then find similar past interactions for a new ticket, filtered to the same product category.

Approach A: Pinecone + Postgres (Two Systems)

import pinecone
import psycopg2
from openai import OpenAI

# --- System 1: Postgres for structured data ---
pg = psycopg2.connect("postgresql://user:pass@postgres-host:5432/support")
pg_cursor = pg.cursor()

# --- System 2: Pinecone for vectors ---
pc = pinecone.Pinecone(api_key="pk-...")
index = pc.Index("support-memories")

openai_client = OpenAI()

# Store a new interaction (must write to BOTH systems)
def store_interaction(ticket_id, agent_id, content, category, resolution):
    # Write structured data to Postgres
    pg_cursor.execute(
        """INSERT INTO interactions (ticket_id, agent_id, content, category, resolution, created_at)
           VALUES (%s, %s, %s, %s, %s, NOW())""",
        (ticket_id, agent_id, content, category, resolution),
    )
    pg.commit()

    # Generate embedding and write to Pinecone
    embedding = openai_client.embeddings.create(
        model="text-embedding-3-small", input=content
    ).data[0].embedding

    index.upsert(vectors=[{
        "id": ticket_id,
        "values": embedding,
        "metadata": {"category": category, "agent_id": agent_id},
    }])


# Search for similar interactions (must query BOTH systems and join manually)
def find_similar(query_text, category, limit=5):
    # Get embedding for the query
    query_vec = openai_client.embeddings.create(
        model="text-embedding-3-small", input=query_text
    ).data[0].embedding

    # Search Pinecone with metadata filter
    results = index.query(
        vector=query_vec,
        top_k=limit,
        filter={"category": {"$eq": category}},
        include_metadata=True,
    )

    # Fetch full records from Postgres (application-level join)
    ticket_ids = [m.id for m in results.matches]
    if not ticket_ids:
        return []

    pg_cursor.execute(
        "SELECT * FROM interactions WHERE ticket_id = ANY(%s)",
        (ticket_ids,),
    )
    rows = pg_cursor.fetchall()

    # Manually merge similarity scores with structured data
    score_map = {m.id: m.score for m in results.matches}
    enriched = []
    for row in rows:
        enriched.append({**dict(row), "similarity": score_map.get(row[0], 0)})

    return sorted(enriched, key=lambda x: x["similarity"], reverse=True)

Issues with this approach:

Two connections to maintain (Postgres + Pinecone)
Application-level consistency (what if one write fails?)
Manual embedding generation
Manual join between vector results and structured data
No audit trail of which agent queried what
No branch isolation, no CoT logging, no triggers

Approach B: HatiData (Single System)

from hatidata import HatiDataClient

client = HatiDataClient(
    host="localhost",
    port=5439,
    api_key="hd_live_your_api_key",
)

# Store a new interaction (single write, embedding is automatic)
def store_interaction(ticket_id, agent_id, content, category, resolution):
    client.memory.store(
        agent_id=agent_id,
        content=content,
        metadata={
            "ticket_id": ticket_id,
            "category": category,
            "resolution": resolution,
        },
    )


# Search with SQL + vector in one query (no manual join, no separate embedding call)
def find_similar(query_text, category, limit=5):
    return client.query(f"""
        SELECT
            m.content,
            m.metadata->>'ticket_id' AS ticket_id,
            m.metadata->>'resolution' AS resolution,
            m.metadata->>'category' AS category,
            semantic_match(m.content, '{query_text}') AS similarity
        FROM _hatidata_agent_memory m
        WHERE m.metadata->>'category' = '{category}'
        ORDER BY semantic_rank(m.content, '{query_text}')
        LIMIT {limit}
    """)

What you get with ANDI:

Single connection (Postgres wire protocol)
Automatic embedding (built-in bge-small-en-v1.5)
SQL + vector in one query -- no application-level join
Every query is audit-logged with agent identity
Add CoT logging, triggers, or branches without changing the data layer

Migration Path

If you are currently using a vector-only database and want to move to ANDI, see the dedicated migration guides:

These guides cover exporting your existing vectors, importing them as HatiData memories, and rewriting your queries.

What is ANDI? -- Overview of the Agent-Native Data Infrastructure
Persistent Memory -- How agent memory works
Hybrid SQL -- SQL + vector search in one engine
Embedding Pipeline -- Automatic embedding architecture
When to Use HatiData -- Decision guide

Feature Comparison​

When to Use a Vector-Only Database​

When to Use ANDI​

Code Comparison: The Same Task, Two Approaches​

Approach A: Pinecone + Postgres (Two Systems)​

Approach B: HatiData (Single System)​

Migration Path​

Related Concepts​

Stay in the loop