Skip to main content

LangChain Integration

The langchain-hatidata package provides three components for building LangChain agents backed by HatiData: persistent conversation memory, a vector store for RAG workflows, and a toolkit of data warehouse tools.

Installation

pip install langchain-hatidata

Dependencies: hatidata-agent, langchain-core >= 0.2.0

Components

ComponentClassPurpose
MemoryHatiDataMemoryPersistent conversation history backed by SQL
Vector StoreHatiDataVectorStoreRAG retrieval with built-in vector search
ToolkitHatiDataToolkit4 tools for schema discovery and querying

HatiDataMemory

Persistent conversation memory that stores message history in HatiData tables. Unlike LangChain's default in-memory ConversationBufferMemory, HatiDataMemory persists across sessions and supports multi-agent shared memory.

Basic Usage

from langchain_hatidata import HatiDataMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
port=5439,
agent_id="support-agent",
password="hd_live_your_api_key",
session_id="user-123-session", # Groups messages by session
memory_key="history", # Variable name in prompt template
return_messages=True, # Return ChatMessage objects (not strings)
)

llm = ChatOpenAI(model="gpt-4o", temperature=0)
chain = ConversationChain(llm=llm, memory=memory, verbose=True)

# First interaction -- stored in HatiData
response = chain.predict(input="What products do we sell?")

# Second interaction -- previous context is loaded from HatiData
response = chain.predict(input="Which of those had the highest revenue last quarter?")

How It Works

HatiDataMemory stores each message as a row in the _hatidata_agent_memory table:

ColumnTypeDescription
memory_idUUIDUnique identifier for the memory entry
agent_idVARCHARAgent that created the memory
session_idVARCHARConversation session grouping
roleVARCHAR"human" or "ai"
contentTEXTMessage content
metadataJSONAdditional context (timestamps, token counts)
created_atTIMESTAMPWhen the message was stored

Multi-Agent Shared Memory

Multiple agents can read from the same session to share context:

# Agent A writes to shared memory
agent_a_memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
agent_id="researcher",
password="hd_live_key_a",
session_id="shared-project-x",
)

# Agent B reads Agent A's messages
agent_b_memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
agent_id="writer",
password="hd_live_key_b",
session_id="shared-project-x",
)

Memory Parameters

ParameterDefaultDescription
host"localhost"HatiData proxy hostname
port5439Proxy port
agent_id"langchain-agent"Agent identifier for billing and audit
password""API key
session_idAuto-generated UUIDGroups messages into conversations
memory_key"history"Variable name in prompt templates
return_messagesTrueReturn ChatMessage objects vs. raw strings
max_messages100Maximum messages to load per session

HatiDataVectorStore

A LangChain-compatible vector store that uses HatiData's built-in vector search for approximate nearest-neighbor retrieval with structured metadata. Designed for RAG workflows where agents need to retrieve relevant context before generating responses.

Basic Usage

from langchain_hatidata import HatiDataVectorStore
from langchain_openai import OpenAIEmbeddings

vector_store = HatiDataVectorStore(
host="your-org.proxy.hatidata.com",
port=5439,
agent_id="rag-agent",
password="hd_live_your_api_key",
embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
collection_name="product_docs",
)

# Add documents
from langchain_core.documents import Document

docs = [
Document(page_content="HatiData supports SQL queries...", metadata={"source": "docs"}),
Document(page_content="Agent memory is persistent...", metadata={"source": "guide"}),
]
vector_store.add_documents(docs)

# Similarity search
results = vector_store.similarity_search("How do I query data?", k=5)
for doc in results:
print(f"[{doc.metadata['source']}] {doc.page_content[:100]}...")

How It Works

HatiDataVectorStore implements a hybrid search architecture:

  1. Embedding generation -- Documents are embedded using your chosen LangChain embedding model
  2. Vector storage -- Embeddings are stored in HatiData's built-in vector search engine for fast ANN retrieval
  3. Metadata storage -- Document metadata and content are stored in HatiData tables
  4. Hybrid retrieval -- Vector search returns top-K candidates by similarity, then metadata is joined by memory_id UUID

This architecture provides sub-10ms p50 search latency while maintaining full SQL queryability over document metadata.

Using as a Retriever

from langchain.chains import RetrievalQA

retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 5},
)

qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o"),
retriever=retriever,
return_source_documents=True,
)

result = qa_chain.invoke({"query": "What are HatiData's security features?"})
print(result["result"])

Vector Store Parameters

ParameterDefaultDescription
host"localhost"HatiData proxy hostname
port5439Proxy port
agent_id"langchain-agent"Agent identifier
password""API key
embedding(required)LangChain Embeddings instance
collection_name"default"Vector collection name
distance_metric"cosine"Vector distance metric (cosine, dot, euclid)

HatiDataToolkit

The toolkit exposes four tools that give LangChain agents the ability to explore and query your data warehouse interactively.

Basic Usage

from langchain_hatidata import HatiDataToolkit
from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate

toolkit = HatiDataToolkit(
host="your-org.proxy.hatidata.com",
agent_id="analyst-agent",
password="hd_live_your_api_key",
)

tools = toolkit.get_tools()

prompt = PromptTemplate.from_template("""You are a data analyst with access to a SQL data warehouse.

Available tools: {tools}
Tool names: {tool_names}

Question: {input}
{agent_scratchpad}""")

llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({"input": "What were our top 5 products by revenue last quarter?"})
print(result["output"])

Tool Reference

hatidata_query

Execute SQL against the warehouse and return results as a formatted string.

query_tool = toolkit.get_tools()[0]  # HatiDataQueryTool
result = query_tool.run("SELECT COUNT(*) as total FROM orders WHERE status = 'completed'")
# Returns: "total\n42857"

Supports both standard SQL and legacy warehouse SQL syntax (NVL, IFF, DATEDIFF, etc.), which is auto-transpiled.

hatidata_list_tables

List all tables the agent has access to.

list_tool = toolkit.get_tools()[1]  # HatiDataListTablesTool
result = list_tool.run("")
# Returns: "customers, orders, products, events, knowledge_base"

hatidata_describe_table

Get the schema (column names and types) for a specific table.

describe_tool = toolkit.get_tools()[2]  # HatiDataDescribeTableTool
result = describe_tool.run("orders")
# Returns: "id INTEGER NOT NULL\ncustomer_id INTEGER NOT NULL\ntotal DECIMAL(10,2)\n..."

RAG full-text search over a table's text columns. Useful for finding relevant rows before writing a precise SQL query.

search_tool = toolkit.get_tools()[3]  # HatiDataContextSearchTool
result = search_tool.run({"table": "knowledge_base", "query": "pricing tiers"})
# Returns matching rows as formatted text

Toolkit Parameters

ParameterDefaultDescription
host"localhost"HatiData proxy hostname
port5439Proxy port
agent_id"langchain-agent"Agent identifier
database"hatidata"Database name
user"agent"Username
password""API key

Full Example: RAG + SQL Agent

Combining all three components for a powerful data-aware agent:

from langchain_hatidata import HatiDataMemory, HatiDataVectorStore, HatiDataToolkit
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.agents import AgentExecutor, create_react_agent

# Shared connection config
config = {
"host": "your-org.proxy.hatidata.com",
"port": 5439,
"password": "hd_live_your_api_key",
}

# Persistent memory
memory = HatiDataMemory(
**config,
agent_id="full-agent",
session_id="analysis-session-1",
)

# Vector store for documentation lookup
vector_store = HatiDataVectorStore(
**config,
agent_id="full-agent",
embedding=OpenAIEmbeddings(),
collection_name="internal_docs",
)

# SQL tools
toolkit = HatiDataToolkit(**config, agent_id="full-agent")
tools = toolkit.get_tools()

# Add retriever as a tool
from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(
vector_store.as_retriever(search_kwargs={"k": 3}),
name="search_docs",
description="Search internal documentation for context",
)
tools.append(retriever_tool)

# Create the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, memory=memory, verbose=True)

# Agent has: SQL access, documentation retrieval, and persistent memory
result = executor.invoke({"input": "Summarize Q4 revenue trends and relate them to our pricing changes"})

Source Code

Next Steps

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.