Skip to main content

Migrate from Pinecone

Pinecone stores vectors and returns approximate nearest neighbors. HatiData stores vectors and SQL data together — you can filter by metadata, join with business tables, apply governance policies, and branch agent state, all in a single query. This guide covers what changes, how to migrate your vectors, and how to rewrite queries.

Vector-Only vs Hybrid Architecture

CapabilityPineconeHatiData
Vector storageYesYes (built-in vector engine)
ANN similarity searchYesYes (cosine, dot, L2)
SQL queries on vector dataNoYes — semantic_match() in SQL
Metadata filteringLimited key-valueFull SQL predicates
Join with business tablesNot supportedNative SQL join
Long-term agent memoryNot supportedSQL + vector hybrid
Chain-of-thought ledgerNot supportedCryptographically hash-chained
Semantic triggersNot supportedBuilt-in trigger evaluation
Branch isolationNot supportedPer-agent schema branches
Governance + auditNot supportedRow-level policies, audit trail
Per-agent billingNot supportedNative
Wire protocolREST API onlyPostgres (any SQL client)

Memory Migration

Export your Pinecone vectors and import them as HatiData memories with full metadata preservation.

Step 1: Export from Pinecone

import pinecone
import json

pc = pinecone.Pinecone(api_key="your-pinecone-key")
index = pc.Index("my-index")

# Fetch all vectors in batches
exported = []
for ids_batch in fetch_all_ids(index): # your pagination logic
result = index.fetch(ids=ids_batch)
for vec_id, vec_data in result.vectors.items():
exported.append({
"id": vec_id,
"values": vec_data.values,
"metadata": vec_data.metadata,
})

with open("pinecone-export.jsonl", "w") as f:
for item in exported:
f.write(json.dumps(item) + "\n")

Step 2: Import as HatiData Memories

hati memory import \
--source pinecone-export.jsonl \
--agent-id my-agent \
--org-id my-org \
--format pinecone-jsonl

The importer maps Pinecone metadata keys to HatiData memory fields and re-indexes vectors in the built-in vector engine with full SQL metadata mirrored in the data layer.

Step 3: Verify Import

-- Confirm memory count
SELECT COUNT(*) FROM _hatidata_memories
WHERE agent_id = 'my-agent';

-- Spot-check a migrated memory
SELECT memory_id, content, metadata, created_at
FROM _hatidata_memories
WHERE agent_id = 'my-agent'
LIMIT 5;

Query Migration

Pinecone Query API vs HatiData SQL

# Before: Pinecone REST query
result = index.query(
vector=query_embedding,
top_k=10,
filter={"source": {"$eq": "customer-support"}},
include_metadata=True,
)

# After: HatiData SQL via any Postgres client
import psycopg2
conn = psycopg2.connect("postgresql://myuser:mypass@localhost:5439/mydb")
cur = conn.cursor()

cur.execute("""
SELECT memory_id, content, metadata,
semantic_match(content, %s) AS similarity
FROM _hatidata_memories
WHERE agent_id = 'my-agent'
AND metadata->>'source' = 'customer-support'
ORDER BY similarity DESC
LIMIT 10
""", (query_embedding,))

Hybrid Search: SQL + Vector Together

HatiData's hybrid search combines vector ANN pre-filtering with exact SQL joins — you get vector recall with SQL precision.

-- Find memories similar to a query, filtered by recency and joined with events
SELECT
m.memory_id,
m.content,
semantic_match(m.content, :query_embedding) AS similarity,
e.event_type,
e.occurred_at
FROM _hatidata_memories m
JOIN agent_events e USING (session_id)
WHERE m.agent_id = 'my-agent'
AND m.created_at > NOW() - INTERVAL '7 days'
ORDER BY similarity DESC
LIMIT 20;

This query is not possible in a vector-only database.

Using the Python SDK

from hatidata import HatiDataClient

client = HatiDataClient(
connection="postgresql://myuser:mypass@localhost:5439/mydb",
agent_id="my-agent",
)

# Store a memory (replaces index.upsert)
client.memory.store(
content="User prefers concise responses",
metadata={"source": "preference", "confidence": 0.9},
)

# Search memories (replaces index.query)
results = client.memory.search(
query="response style preferences",
top_k=5,
filters={"source": "preference"},
)

What You Gain

Moving from a vector-only database to HatiData gives agents a complete cognitive infrastructure:

  • SQL queries on vector data — filter, join, aggregate alongside embeddings
  • Hybrid search — vector ANN pre-filter + exact cosine verification for high-recall, high-precision results
  • Governance — row-level policies, audit trails, and per-agent access control on every memory read
  • Chain-of-thought ledger — immutable, cryptographically hash-chained reasoning traces stored alongside memories
  • Semantic triggers — fire webhooks or agent notifications when stored content crosses a similarity threshold
  • Branch isolation — create a copy-on-write branch of agent state for safe experimentation
  • Single wire protocol — Postgres-compatible, works with psycopg2, asyncpg, SQLAlchemy, dbt, and any BI tool

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.