Persistent Memory

Persistent memory gives AI agents the ability to store, search, and retrieve long-term context that survives restarts, spans sessions, and can be shared between agents.

Why Agents Need Persistent Memory

Without persistent memory, every agent session starts from scratch. The agent cannot remember:

What it learned in previous conversations
Customer preferences it discovered
Facts it extracted from data analysis
Decisions it made and why

HatiData's memory system solves this with a hybrid architecture that combines structured SQL metadata with vector similarity search — giving agents both precise filtering and semantic understanding.

How It Works

Agent → store_memory → HatiData → stored (metadata + vector embedding)

Agent → search_memory → HatiData → ranked results

Structured storage holds the authoritative records: content, metadata, agent identity, timestamps, and access counts
Built-in vector search stores embeddings for fast approximate nearest-neighbor (ANN) search
The two are joined by memory_id (UUID) for hybrid retrieval

Memory Schema

Each memory entry is stored in the _hatidata_agent_memory table:

Column	Type	Description
`memory_id`	UUID	Primary key, joins with vector embeddings
`org_id`	VARCHAR	Organization identifier
`agent_id`	VARCHAR	Agent that created the memory
`session_id`	VARCHAR	Conversation or task session grouping
`memory_type`	VARCHAR	Category: `fact`, `episode`, `preference`, `context`
`content`	TEXT	The memory content (natural language or structured data)
`metadata`	JSON	Arbitrary key-value metadata
`importance`	FLOAT	Importance score (0.0 to 1.0)
`has_embedding`	BOOLEAN	Whether a vector embedding has been generated
`access_count`	BIGINT	Number of times this memory has been retrieved
`last_accessed`	TIMESTAMP	When this memory was last retrieved
`created_at`	TIMESTAMP	When the memory was stored
`updated_at`	TIMESTAMP	Last modification timestamp

Architecture

Embedding Service

HatiData uses a pluggable embedding service to generate vector embeddings. In production, configure a provider that calls your embedding API (OpenAI, Cohere, etc.). A mock provider is included for testing.

Write Path

Storing a new memory happens in two phases:

The record is inserted immediately with has_embedding = false (fast — returns immediately)
The content is dispatched asynchronously for embedding generation
Once the embedding is ready, has_embedding is set to true and the vector is stored

Embedding requests are processed in configurable batches to minimize API calls. Failures are handled gracefully — memories remain searchable by metadata even if embedding generation fails.

Search Path

Search uses hybrid retrieval with graceful degradation:

Full hybrid mode (vector search + metadata):

Embed the search query
Approximate nearest-neighbor search for top-K candidate memory_id values
Join candidates with metadata to apply filters (agent_id, session_id, memory_type)
Return results ranked by vector similarity score

Metadata-only fallback (vector search unavailable):

Fall back to SQL LIKE and relevance heuristics
Less precise but fully functional

Agents never lose access to their memories, even if vector search is temporarily unavailable.

Access Tracking

Real-time access counts are maintained efficiently:

Lock-free in-memory counters with periodic batch flush
Tracks both access_count and last_accessed
Access patterns inform importance scoring and eviction

MCP Tools

`store_memory`

// Input
{
  "content": "The customer prefers email communication over phone calls",
  "memory_type": "preference",
  "metadata": { "customer_id": "cust-42", "source": "support-ticket-789" },
  "importance": 0.8
}

// Output
{
  "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "stored",
  "has_embedding": false
}

`search_memory`

// Input
{
  "query": "customer communication preferences",
  "top_k": 10,
  "memory_type": "preference",
  "min_importance": 0.5
}

// Output
[{
  "memory_id": "a1b2c3d4-...",
  "content": "The customer prefers email communication over phone calls",
  "memory_type": "preference",
  "importance": 0.8,
  "similarity_score": 0.94,
  "created_at": "2025-01-15T10:30:00Z"
}]

`get_agent_state`

// Input
{ "key": "current_task" }

// Output
{
  "key": "current_task",
  "value": "Analyzing Q4 revenue data for the board presentation",
  "updated_at": "2025-01-15T14:20:00Z"
}

Agent state is stored in a dedicated _hatidata_agent_state table, scoped by agent_id.

`set_agent_state`

{ "key": "current_task", "value": "Generating the final report" }

`delete_memory`

{ "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" }

Deletes the memory and its vector embedding. Irreversible.

Usage Examples

Python SDK

from hatidata_agent import HatiDataAgent

agent = HatiDataAgent(
    host="your-org.proxy.hatidata.com",
    agent_id="research-agent",
    password="hd_live_your_api_key",
)

# Store a memory
agent.store_memory(
    content="Revenue grew 15% in Q4 driven by enterprise segment",
    memory_type="fact",
    importance=0.9,
    metadata={"quarter": "Q4", "metric": "revenue"},
)

# Search memories
results = agent.search_memory(
    query="revenue growth trends",
    top_k=5,
    memory_type="fact",
)
for memory in results:
    print(f"[{memory['importance']}] {memory['content']}")

# Agent state
agent.set_state("analysis_phase", "data_collection")
phase = agent.get_state("analysis_phase")

LangChain Integration

from langchain_hatidata import HatiDataMemory

memory = HatiDataMemory(
    host="your-org.proxy.hatidata.com",
    agent_id="langchain-agent",
    password="hd_live_your_api_key",
    session_id="conversation-123",
)

from langchain.chains import ConversationChain
chain = ConversationChain(llm=llm, memory=memory)

See the LangChain integration for full details.

Configuration

Memory behavior is configurable per deployment:

Setting	Default	Description
Memory enabled	`true`	Enable/disable the memory subsystem
Embedding batch size	32	Max texts per embedding API call
Embedding flush interval	1 second	Max wait before flushing a partial batch
Max search results	100	Maximum results per search query
Access tracker flush interval	60 seconds	How often access counts are persisted

Hybrid SQL — Combine memory search with structured SQL queries
Chain-of-Thought Ledger — Immutable reasoning traces
Semantic Triggers — Trigger actions based on memory content
LangChain Integration — Using memory with LangChain agents

Why Agents Need Persistent Memory​

How It Works​

Memory Schema​

Architecture​

Embedding Service​

Write Path​

Search Path​

Access Tracking​

MCP Tools​

store_memory​

search_memory​

get_agent_state​

set_agent_state​

delete_memory​

Usage Examples​

Python SDK​

LangChain Integration​

Configuration​

Related Concepts​

Stay in the loop