LangChain Integration
The langchain-hatidata package provides three components for building LangChain agents backed by HatiData: persistent conversation memory, a vector store for RAG workflows, and a toolkit of data warehouse tools.
Installation
pip install langchain-hatidata
Dependencies: hatidata-agent, langchain-core >= 0.2.0
Components
| Component | Class | Purpose |
|---|---|---|
| Memory | HatiDataMemory | Persistent conversation history backed by SQL |
| Vector Store | HatiDataVectorStore | RAG retrieval with built-in vector search |
| Toolkit | HatiDataToolkit | 4 tools for schema discovery and querying |
HatiDataMemory
Persistent conversation memory that stores message history in HatiData tables. Unlike LangChain's default in-memory ConversationBufferMemory, HatiDataMemory persists across sessions and supports multi-agent shared memory.
Basic Usage
from langchain_hatidata import HatiDataMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
port=5439,
agent_id="support-agent",
password="hd_live_your_api_key",
session_id="user-123-session", # Groups messages by session
memory_key="history", # Variable name in prompt template
return_messages=True, # Return ChatMessage objects (not strings)
)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
chain = ConversationChain(llm=llm, memory=memory, verbose=True)
# First interaction -- stored in HatiData
response = chain.predict(input="What products do we sell?")
# Second interaction -- previous context is loaded from HatiData
response = chain.predict(input="Which of those had the highest revenue last quarter?")
How It Works
HatiDataMemory stores each message as a row in the _hatidata_agent_memory table:
| Column | Type | Description |
|---|---|---|
memory_id | UUID | Unique identifier for the memory entry |
agent_id | VARCHAR | Agent that created the memory |
session_id | VARCHAR | Conversation session grouping |
role | VARCHAR | "human" or "ai" |
content | TEXT | Message content |
metadata | JSON | Additional context (timestamps, token counts) |
created_at | TIMESTAMP | When the message was stored |
Multi-Agent Shared Memory
Multiple agents can read from the same session to share context:
# Agent A writes to shared memory
agent_a_memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
agent_id="researcher",
password="hd_live_key_a",
session_id="shared-project-x",
)
# Agent B reads Agent A's messages
agent_b_memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
agent_id="writer",
password="hd_live_key_b",
session_id="shared-project-x",
)
Memory Parameters
| Parameter | Default | Description |
|---|---|---|
host | "localhost" | HatiData proxy hostname |
port | 5439 | Proxy port |
agent_id | "langchain-agent" | Agent identifier for billing and audit |
password | "" | API key |
session_id | Auto-generated UUID | Groups messages into conversations |
memory_key | "history" | Variable name in prompt templates |
return_messages | True | Return ChatMessage objects vs. raw strings |
max_messages | 100 | Maximum messages to load per session |
HatiDataVectorStore
A LangChain-compatible vector store that uses HatiData's built-in vector search for approximate nearest-neighbor retrieval with structured metadata. Designed for RAG workflows where agents need to retrieve relevant context before generating responses.
Basic Usage
from langchain_hatidata import HatiDataVectorStore
from langchain_openai import OpenAIEmbeddings
vector_store = HatiDataVectorStore(
host="your-org.proxy.hatidata.com",
port=5439,
agent_id="rag-agent",
password="hd_live_your_api_key",
embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
collection_name="product_docs",
)
# Add documents
from langchain_core.documents import Document
docs = [
Document(page_content="HatiData supports SQL queries...", metadata={"source": "docs"}),
Document(page_content="Agent memory is persistent...", metadata={"source": "guide"}),
]
vector_store.add_documents(docs)
# Similarity search
results = vector_store.similarity_search("How do I query data?", k=5)
for doc in results:
print(f"[{doc.metadata['source']}] {doc.page_content[:100]}...")
How It Works
HatiDataVectorStore implements a hybrid search architecture:
- Embedding generation -- Documents are embedded using your chosen LangChain embedding model
- Vector storage -- Embeddings are stored in HatiData's built-in vector search engine for fast ANN retrieval
- Metadata storage -- Document metadata and content are stored in HatiData tables
- Hybrid retrieval -- Vector search returns top-K candidates by similarity, then metadata is joined by
memory_idUUID
This architecture provides sub-10ms p50 search latency while maintaining full SQL queryability over document metadata.
Using as a Retriever
from langchain.chains import RetrievalQA
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 5},
)
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o"),
retriever=retriever,
return_source_documents=True,
)
result = qa_chain.invoke({"query": "What are HatiData's security features?"})
print(result["result"])
Vector Store Parameters
| Parameter | Default | Description |
|---|---|---|
host | "localhost" | HatiData proxy hostname |
port | 5439 | Proxy port |
agent_id | "langchain-agent" | Agent identifier |
password | "" | API key |
embedding | (required) | LangChain Embeddings instance |
collection_name | "default" | Vector collection name |
distance_metric | "cosine" | Vector distance metric (cosine, dot, euclid) |
HatiDataToolkit
The toolkit exposes four tools that give LangChain agents the ability to explore and query your data warehouse interactively.
Basic Usage
from langchain_hatidata import HatiDataToolkit
from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
toolkit = HatiDataToolkit(
host="your-org.proxy.hatidata.com",
agent_id="analyst-agent",
password="hd_live_your_api_key",
)
tools = toolkit.get_tools()
prompt = PromptTemplate.from_template("""You are a data analyst with access to a SQL data warehouse.
Available tools: {tools}
Tool names: {tool_names}
Question: {input}
{agent_scratchpad}""")
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = executor.invoke({"input": "What were our top 5 products by revenue last quarter?"})
print(result["output"])
Tool Reference
hatidata_query
Execute SQL against the warehouse and return results as a formatted string.
query_tool = toolkit.get_tools()[0] # HatiDataQueryTool
result = query_tool.run("SELECT COUNT(*) as total FROM orders WHERE status = 'completed'")
# Returns: "total\n42857"
Supports both standard SQL and legacy warehouse SQL syntax (NVL, IFF, DATEDIFF, etc.), which is auto-transpiled.
hatidata_list_tables
List all tables the agent has access to.
list_tool = toolkit.get_tools()[1] # HatiDataListTablesTool
result = list_tool.run("")
# Returns: "customers, orders, products, events, knowledge_base"
hatidata_describe_table
Get the schema (column names and types) for a specific table.
describe_tool = toolkit.get_tools()[2] # HatiDataDescribeTableTool
result = describe_tool.run("orders")
# Returns: "id INTEGER NOT NULL\ncustomer_id INTEGER NOT NULL\ntotal DECIMAL(10,2)\n..."
hatidata_context_search
RAG full-text search over a table's text columns. Useful for finding relevant rows before writing a precise SQL query.
search_tool = toolkit.get_tools()[3] # HatiDataContextSearchTool
result = search_tool.run({"table": "knowledge_base", "query": "pricing tiers"})
# Returns matching rows as formatted text
Toolkit Parameters
| Parameter | Default | Description |
|---|---|---|
host | "localhost" | HatiData proxy hostname |
port | 5439 | Proxy port |
agent_id | "langchain-agent" | Agent identifier |
database | "hatidata" | Database name |
user | "agent" | Username |
password | "" | API key |
Full Example: RAG + SQL Agent
Combining all three components for a powerful data-aware agent:
from langchain_hatidata import HatiDataMemory, HatiDataVectorStore, HatiDataToolkit
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.agents import AgentExecutor, create_react_agent
# Shared connection config
config = {
"host": "your-org.proxy.hatidata.com",
"port": 5439,
"password": "hd_live_your_api_key",
}
# Persistent memory
memory = HatiDataMemory(
**config,
agent_id="full-agent",
session_id="analysis-session-1",
)
# Vector store for documentation lookup
vector_store = HatiDataVectorStore(
**config,
agent_id="full-agent",
embedding=OpenAIEmbeddings(),
collection_name="internal_docs",
)
# SQL tools
toolkit = HatiDataToolkit(**config, agent_id="full-agent")
tools = toolkit.get_tools()
# Add retriever as a tool
from langchain.tools.retriever import create_retriever_tool
retriever_tool = create_retriever_tool(
vector_store.as_retriever(search_kwargs={"k": 3}),
name="search_docs",
description="Search internal documentation for context",
)
tools.append(retriever_tool)
# Create the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, memory=memory, verbose=True)
# Agent has: SQL access, documentation retrieval, and persistent memory
result = executor.invoke({"input": "Summarize Q4 revenue trends and relate them to our pricing changes"})
Source Code
Next Steps
- CrewAI Integration -- Multi-agent workflows with CrewAI
- MCP Server -- Expose HatiData to Claude and Cursor
- Agent Memory -- How HatiData's memory system works under the hood