LangChain Integration
HatiData integrates with LangChain as an agent-native data layer — giving LangChain agents persistent memory that survives process restarts, a vector store backed by HatiData's hybrid search engine, and a toolkit of governed SQL tools. Every interaction is attributed to the agent, audited, and metered through HatiData's ABAC policy pipeline.
The langchain-hatidata package provides three drop-in components:
| Component | Class | Purpose |
|---|---|---|
| Memory | HatiDataMemory | Persistent conversation history backed by SQL |
| Vector Store | HatiDataVectorStore | RAG retrieval with built-in vector search |
| Toolkit | HatiDataToolkit | 4 tools for schema discovery and querying |
Installation
pip install langchain-hatidata
Dependencies: hatidata-agent, langchain-core >= 0.2.0
HatiDataMemory
Persistent conversation memory that stores message history in HatiData tables. Unlike LangChain's default in-memory ConversationBufferMemory, HatiDataMemory persists across sessions and supports multi-agent shared memory patterns.
Basic Usage
from langchain_hatidata import HatiDataMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
port=5439,
agent_id="support-agent",
password="hd_live_your_api_key",
session_id="user-123-session", # Groups messages by session
memory_key="history", # Variable name in prompt template
return_messages=True, # Return ChatMessage objects
)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
chain = ConversationChain(llm=llm, memory=memory, verbose=True)
# First interaction — stored in HatiData
response = chain.predict(input="What products do we sell?")
# Second interaction — previous context loaded from HatiData
response = chain.predict(input="Which of those had the highest revenue last quarter?")
Storage Schema
HatiDataMemory stores each message as a row in _hatidata_agent_memory:
| Column | Type | Description |
|---|---|---|
memory_id | UUID | Unique identifier for the memory entry |
agent_id | VARCHAR | Agent that created the memory |
session_id | VARCHAR | Conversation session grouping |
role | VARCHAR | "human" or "ai" |
content | TEXT | Message content |
metadata | JSON | Timestamps, token counts, and other context |
created_at | TIMESTAMP | When the message was stored |
Multi-Agent Shared Memory
Multiple agents can read from the same session to share context:
# Agent A writes to shared memory
agent_a_memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
agent_id="researcher",
password="hd_live_key_a",
session_id="shared-project-x",
)
# Agent B reads from the same session
agent_b_memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
agent_id="writer",
password="hd_live_key_b",
session_id="shared-project-x",
)
Memory Parameters
| Parameter | Default | Description |
|---|---|---|
host | "localhost" | HatiData proxy hostname |
port | 5439 | Proxy port |
agent_id | "langchain-agent" | Agent identifier for billing and audit |
password | "" | API key |
session_id | Auto-generated UUID | Groups messages into conversations |
memory_key | "history" | Variable name in prompt templates |
return_messages | True | Return ChatMessage objects vs. raw strings |
max_messages | 100 | Maximum messages to load per session |
HatiDataVectorStore
A LangChain-compatible vector store that uses HatiData's hybrid search engine (vector ANN + SQL metadata join) for approximate nearest-neighbor retrieval with structured metadata filtering. Designed for RAG workflows where agents need to retrieve relevant context before generating responses.
Basic Usage
from langchain_hatidata import HatiDataVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
vector_store = HatiDataVectorStore(
host="your-org.proxy.hatidata.com",
port=5439,
agent_id="rag-agent",
password="hd_live_your_api_key",
embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
collection_name="product_docs",
)
# Add documents
docs = [
Document(page_content="HatiData supports SQL queries...", metadata={"source": "docs"}),
Document(page_content="Agent memory is persistent...", metadata={"source": "guide"}),
]
vector_store.add_documents(docs)
# Similarity search
results = vector_store.similarity_search("How do I query data?", k=5)
for doc in results:
print(f"[{doc.metadata['source']}] {doc.page_content[:100]}...")
How Hybrid Search Works
- Documents are embedded using your chosen LangChain embedding model
- Embeddings are stored in HatiData's built-in vector engine for fast ANN retrieval
- Document metadata and content are stored in HatiData tables
- At query time: vector ANN search returns top-K candidates, then the query engine joins metadata by
memory_idUUID
This provides sub-10ms p50 search latency while keeping document metadata fully SQL-queryable.
Using as a Retriever
from langchain.chains import RetrievalQA
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 5},
)
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o"),
retriever=retriever,
return_source_documents=True,
)
result = qa_chain.invoke({"query": "What are HatiData's security features?"})
print(result["result"])
Vector Store Parameters
| Parameter | Default | Description |
|---|---|---|
host | "localhost" | HatiData proxy hostname |
port | 5439 | Proxy port |
agent_id | "langchain-agent" | Agent identifier |
password | "" | API key |
embedding | (required) | LangChain Embeddings instance |
collection_name | "default" | Vector collection name |
distance_metric | "cosine" | Distance metric: cosine, dot, or euclid |
HatiDataToolkit
The toolkit exposes four tools that give LangChain agents the ability to explore and query your data layer interactively. Each tool is a BaseTool subclass and can be passed directly to any LangChain agent executor.
Basic Usage
from langchain_hatidata import HatiDataToolkit
from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
toolkit = HatiDataToolkit(
host="your-org.proxy.hatidata.com",
agent_id="analyst-agent",
password="hd_live_your_api_key",
)
tools = toolkit.get_tools()
prompt = PromptTemplate.from_template("""You are a data analyst with access to a SQL data layer.
Available tools: {tools}
Tool names: {tool_names}
Question: {input}
{agent_scratchpad}""")
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = executor.invoke({"input": "What were our top 5 products by revenue last quarter?"})
print(result["output"])
Tool Reference
hatidata_query
Execute SQL and return results as a formatted string. Supports both standard SQL and legacy dialect syntax (NVL, IFF, DATEDIFF) — auto-transpiled.
query_tool = toolkit.get_tools()[0]
result = query_tool.run("SELECT COUNT(*) as total FROM orders WHERE status = 'completed'")
# Returns: "total\n42857"
hatidata_list_tables
List all tables the agent has permission to access.
list_tool = toolkit.get_tools()[1]
result = list_tool.run("")
# Returns: "customers, orders, products, events, knowledge_base"
hatidata_describe_table
Get column names and types for a specific table.
describe_tool = toolkit.get_tools()[2]
result = describe_tool.run("orders")
# Returns: "id INTEGER NOT NULL\ncustomer_id INTEGER NOT NULL\ntotal DECIMAL(10,2)\n..."
hatidata_context_search
Full-text search over a table's text columns. Useful for RAG workflows where the agent needs to find relevant rows before writing a precise SQL query.
search_tool = toolkit.get_tools()[3]
result = search_tool.run({"table": "knowledge_base", "query": "pricing tiers"})
Toolkit Parameters
| Parameter | Default | Description |
|---|---|---|
host | "localhost" | HatiData proxy hostname |
port | 5439 | Proxy port |
agent_id | "langchain-agent" | Agent identifier |
database | "hatidata" | Database name |
user | "agent" | Username |
password | "" | API key |
Full Example: Memory + RAG + SQL Agent
Combining all three components for a persistent, data-aware agent:
from langchain_hatidata import HatiDataMemory, HatiDataVectorStore, HatiDataToolkit
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.agents import AgentExecutor, create_react_agent
from langchain.tools.retriever import create_retriever_tool
# Shared connection config
config = {
"host": "your-org.proxy.hatidata.com",
"port": 5439,
"password": "hd_live_your_api_key",
}
# Persistent memory
memory = HatiDataMemory(**config, agent_id="full-agent", session_id="analysis-session-1")
# Vector store for documentation lookup
vector_store = HatiDataVectorStore(
**config,
agent_id="full-agent",
embedding=OpenAIEmbeddings(),
collection_name="internal_docs",
)
# SQL tools
toolkit = HatiDataToolkit(**config, agent_id="full-agent")
tools = toolkit.get_tools()
# Add retriever as a tool
retriever_tool = create_retriever_tool(
vector_store.as_retriever(search_kwargs={"k": 3}),
name="search_docs",
description="Search internal documentation for context",
)
tools.append(retriever_tool)
# Build the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, memory=memory, verbose=True)
result = executor.invoke({
"input": "Summarize Q4 revenue trends and relate them to our pricing changes"
})
Source Code
Related Concepts
- Python SDK — Direct agent-aware queries without LangChain
- Core Concepts: Persistent Memory — How the memory system works under the hood
- MCP Tools Reference — All 24 MCP tools
- MCP Setup — Connect Claude and Cursor directly to your data layer
- CrewAI Integration — Multi-agent workflows with CrewAI