Skip to main content

Persistent Memory

Persistent memory gives AI agents the ability to store, search, and retrieve long-term context that survives restarts, spans sessions, and can be shared between agents.

Why Agents Need Persistent Memory

Without persistent memory, every agent session starts from scratch. The agent cannot remember:

  • What it learned in previous conversations
  • Customer preferences it discovered
  • Facts it extracted from data analysis
  • Decisions it made and why

HatiData's memory system solves this with a hybrid architecture that combines structured SQL metadata with vector similarity search — giving agents both precise filtering and semantic understanding.

How It Works

Agent → store_memory → HatiData → stored (metadata + vector embedding)

Agent → search_memory → HatiData → ranked results
  • Structured storage holds the authoritative records: content, metadata, agent identity, timestamps, and access counts
  • Built-in vector search stores embeddings for fast approximate nearest-neighbor (ANN) search
  • The two are joined by memory_id (UUID) for hybrid retrieval

Memory Schema

Each memory entry is stored in the _hatidata_agent_memory table:

ColumnTypeDescription
memory_idUUIDPrimary key, joins with vector embeddings
org_idVARCHAROrganization identifier
agent_idVARCHARAgent that created the memory
session_idVARCHARConversation or task session grouping
memory_typeVARCHARCategory: fact, episode, preference, context
contentTEXTThe memory content (natural language or structured data)
metadataJSONArbitrary key-value metadata
importanceFLOATImportance score (0.0 to 1.0)
has_embeddingBOOLEANWhether a vector embedding has been generated
access_countBIGINTNumber of times this memory has been retrieved
last_accessedTIMESTAMPWhen this memory was last retrieved
created_atTIMESTAMPWhen the memory was stored
updated_atTIMESTAMPLast modification timestamp

Architecture

Embedding Service

HatiData uses a pluggable embedding service to generate vector embeddings. In production, configure a provider that calls your embedding API (OpenAI, Cohere, etc.). A mock provider is included for testing.

Write Path

Storing a new memory happens in two phases:

  1. The record is inserted immediately with has_embedding = false (fast — returns immediately)
  2. The content is dispatched asynchronously for embedding generation
  3. Once the embedding is ready, has_embedding is set to true and the vector is stored

Embedding requests are processed in configurable batches to minimize API calls. Failures are handled gracefully — memories remain searchable by metadata even if embedding generation fails.

Search Path

Search uses hybrid retrieval with graceful degradation:

Full hybrid mode (vector search + metadata):

  1. Embed the search query
  2. Approximate nearest-neighbor search for top-K candidate memory_id values
  3. Join candidates with metadata to apply filters (agent_id, session_id, memory_type)
  4. Return results ranked by vector similarity score

Metadata-only fallback (vector search unavailable):

  1. Fall back to SQL LIKE and relevance heuristics
  2. Less precise but fully functional

Agents never lose access to their memories, even if vector search is temporarily unavailable.

Access Tracking

Real-time access counts are maintained efficiently:

  • Lock-free in-memory counters with periodic batch flush
  • Tracks both access_count and last_accessed
  • Access patterns inform importance scoring and eviction

MCP Tools

store_memory

// Input
{
"content": "The customer prefers email communication over phone calls",
"memory_type": "preference",
"metadata": { "customer_id": "cust-42", "source": "support-ticket-789" },
"importance": 0.8
}

// Output
{
"memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "stored",
"has_embedding": false
}

search_memory

// Input
{
"query": "customer communication preferences",
"top_k": 10,
"memory_type": "preference",
"min_importance": 0.5
}

// Output
[{
"memory_id": "a1b2c3d4-...",
"content": "The customer prefers email communication over phone calls",
"memory_type": "preference",
"importance": 0.8,
"similarity_score": 0.94,
"created_at": "2025-01-15T10:30:00Z"
}]

get_agent_state

// Input
{ "key": "current_task" }

// Output
{
"key": "current_task",
"value": "Analyzing Q4 revenue data for the board presentation",
"updated_at": "2025-01-15T14:20:00Z"
}

Agent state is stored in a dedicated _hatidata_agent_state table, scoped by agent_id.

set_agent_state

{ "key": "current_task", "value": "Generating the final report" }

delete_memory

{ "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" }

Deletes the memory and its vector embedding. Irreversible.

Usage Examples

Python SDK

from hatidata_agent import HatiDataAgent

agent = HatiDataAgent(
host="your-org.proxy.hatidata.com",
agent_id="research-agent",
password="hd_live_your_api_key",
)

# Store a memory
agent.store_memory(
content="Revenue grew 15% in Q4 driven by enterprise segment",
memory_type="fact",
importance=0.9,
metadata={"quarter": "Q4", "metric": "revenue"},
)

# Search memories
results = agent.search_memory(
query="revenue growth trends",
top_k=5,
memory_type="fact",
)
for memory in results:
print(f"[{memory['importance']}] {memory['content']}")

# Agent state
agent.set_state("analysis_phase", "data_collection")
phase = agent.get_state("analysis_phase")

LangChain Integration

from langchain_hatidata import HatiDataMemory

memory = HatiDataMemory(
host="your-org.proxy.hatidata.com",
agent_id="langchain-agent",
password="hd_live_your_api_key",
session_id="conversation-123",
)

from langchain.chains import ConversationChain
chain = ConversationChain(llm=llm, memory=memory)

See the LangChain integration for full details.

Configuration

Memory behavior is configurable per deployment:

SettingDefaultDescription
Memory enabledtrueEnable/disable the memory subsystem
Embedding batch size32Max texts per embedding API call
Embedding flush interval1 secondMax wait before flushing a partial batch
Max search results100Maximum results per search query
Access tracker flush interval60 secondsHow often access counts are persisted

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.