Skip to main content

MCP Tools Reference

HatiData's MCP server exposes 25 tools in six categories: Core Query (7), Memory (5), Chain-of-Thought (3), Triggers (4), Branching (5), and Identity (1). Every tool call passes through the full security pipeline — authentication, ABAC policy evaluation, audit logging, and metering.

Two MCP Servers

HatiData ships two distinct MCP server implementations:

  • Proxy MCP Server (port 5440, HTTP/SSE): Exposes all 25 tools listed on this page. Used by agents connecting directly to HatiData's data plane. Supports the full security pipeline including ABAC, row-level security, column masking, metering, and audit logging.
  • SDK MCP Server (hatidata-mcp-server CLI): Exposes 4 basic tools (query, list_tables, describe_table, get_context) via stdio transport. Used with Claude Desktop, Cursor, and Claude Code. See MCP Setup for configuration.

The tool reference below documents the Proxy MCP Server tools.

For MCP server setup, see MCP Setup.


Core Query (7)

Tools for executing SQL and exploring schemas within the data layer.


run_sql

Execute DDL, DML, or DQL SQL statements. Supports both standard SQL and Snowflake-compatible syntax (NVL, IFF, DATEDIFF, VARIANT, etc.), which is auto-transpiled. Passes through the full multi-stage pipeline: policy evaluation, cost estimation, quota checks, row-level security, transpilation, execution, AI healing on failure, column masking, and audit logging.

Scope: Requires QueryWrite scope for DDL/DML statements, QueryRead for SELECT queries.

Parameters:

NameTypeRequiredDescription
sqlstringYesSQL statement to execute
timeout_msintegerNoQuery timeout in milliseconds (default: 30000)

Returns: Array of row objects (for queries) or execution confirmation (for DDL/DML).

// Input
{ "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 5", "timeout_ms": 15000 }

// Output
[
{ "customer_id": 42, "revenue": 125000.00 },
{ "customer_id": 17, "revenue": 98500.00 },
{ "customer_id": 8, "revenue": 87200.00 }
]

run_sql_readonly

Execute SELECT-only queries. Any DDL or DML statements are rejected. Useful for agents that should only read data without modifying it.

Scope: Requires QueryRead scope.

Parameters:

NameTypeRequiredDescription
sqlstringYesSELECT query to execute
max_rowsintegerNoMaximum rows to return (default: 1000)
timeout_msintegerNoQuery timeout in milliseconds (default: 30000)

Returns: Array of row objects.

// Input
{ "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY 1 ORDER BY 2 DESC", "max_rows": 5 }

// Output
[
{ "customer_id": 42, "revenue": 125000.00 },
{ "customer_id": 17, "revenue": 98500.00 },
{ "customer_id": 8, "revenue": 87200.00 }
]

list_schemas

Enumerate all schemas the authenticated agent has permission to access. Results are filtered by ABAC policies.

Scope: Requires QueryRead scope.

Parameters: None.

Returns: Array of schema name strings.

// Input
{}

// Output
["main", "analytics", "staging"]

list_tables

List all tables within a specific schema that the authenticated agent has permission to access. Results are filtered by ABAC policies — agents only see tables they are authorized to query.

Scope: Requires QueryRead scope.

Parameters:

NameTypeRequiredDescription
schemastringNoSchema to list tables from (default: "main")

Returns: Array of table name strings.

// Input
{ "schema": "main" }

// Output
["customers", "orders", "products", "events", "knowledge_base"]

describe_table

Get column names, data types, and nullability for a table. Queries information_schema.columns. Columns masked by policy are excluded from the result. Supports schema.table format for tables outside the default schema.

Scope: Requires QueryRead scope.

Parameters:

NameTypeRequiredDescription
tablestringYesName of the table to describe (supports schema.table format)

Returns: Array of column descriptor objects.

// Input
{ "table": "orders" }

// Output
[
{ "column_name": "id", "data_type": "INTEGER", "is_nullable": "NO" },
{ "column_name": "customer_id", "data_type": "INTEGER", "is_nullable": "NO" },
{ "column_name": "total", "data_type": "DECIMAL(10,2)", "is_nullable": "YES" },
{ "column_name": "status", "data_type": "VARCHAR", "is_nullable": "YES" },
{ "column_name": "created_at", "data_type": "TIMESTAMP", "is_nullable": "YES" }
]

run_sql_arrow

Execute a read-only SQL query and return results in Apache Arrow IPC format (base64-encoded). Designed for high-performance data transfer to agents that can process Arrow natively.

Scope: Requires QueryRead scope.

Parameters:

NameTypeRequiredDescription
sqlstringYesSELECT query to execute
max_rowsintegerNoMaximum rows to return (default: 10000)

Returns: Base64-encoded Apache Arrow IPC stream.

// Input
{ "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY 1", "max_rows": 5000 }

// Output
{
"format": "arrow-ipc",
"data": "QVJST1cxAA...",
"row_count": 1250,
"schema": [
{ "name": "customer_id", "type": "Int32" },
{ "name": "revenue", "type": "Decimal128(10,2)" }
]
}

get_usage_stats

Retrieve usage statistics for the current agent, including query counts, memory writes, branch operations, and resource consumption.

Scope: Requires QueryRead scope.

Parameters: None.

Returns: Usage statistics object.

// Input
{}

// Output
{
"queries_executed": 142,
"memory_writes": 37,
"memory_reads": 85,
"branches_created": 3,
"resources_used_mb": 256.5,
"period_start": "2025-12-01T00:00:00Z",
"period_end": "2025-12-15T23:59:59Z"
}

Memory (5)

Tools for storing, retrieving, and managing long-term agent memory. Memory uses a hybrid architecture: vector ANN search for semantic recall, with structured metadata filtering and storage. See Persistent Memory for architecture details.


store_memory

Store a new memory entry with optional tags, metadata, and importance scoring. Embedding is generated asynchronously after storage.

Parameters:

NameTypeRequiredDescription
contentstringYesMemory content
tagsstring[]NoTags for categorical filtering
metadataobjectNoArbitrary key-value metadata
importancefloatNoImportance score 0.0–1.0 (default: 0.5)

Returns: Memory creation confirmation with ID.

// Input
{
"content": "Enterprise customers have a 95% renewal rate in Q4",
"tags": ["revenue", "enterprise", "retention"],
"metadata": { "source": "quarterly-review", "quarter": "Q4-2025" },
"importance": 0.9
}

// Output
{
"memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "stored",
"has_embedding": false
}

has_embedding becomes true once the async embedding worker processes the entry (typically within seconds).


search_memory

Search memories using natural language queries with optional tag filters. Uses vector ANN for semantic search, then joins with structured metadata for filtering. Falls back to heuristic scoring if the vector index is unavailable.

Parameters:

NameTypeRequiredDescription
querystringYesNatural language search query
limitintegerNoMaximum results (default: 10)
min_scorefloatNoMinimum similarity score threshold (default: 0.7)
tagsstring[]NoFilter results to memories with any of these tags

Returns: Array of memory objects ranked by similarity score.

// Input
{ "query": "customer renewal rates", "limit": 3, "tags": ["revenue", "retention"] }

// Output
[
{
"memory_id": "a1b2c3d4-...",
"content": "Enterprise customers have a 95% renewal rate in Q4",
"tags": ["revenue", "enterprise", "retention"],
"importance": 0.9,
"similarity_score": 0.94,
"metadata": { "source": "quarterly-review" },
"created_at": "2025-12-15T10:30:00Z"
}
]

get_agent_state

Retrieve a named state value for the current agent. State is stored in _hatidata_agent_state, scoped by agent_id.

Parameters:

NameTypeRequiredDescription
keystringYesState key name

Returns: State key-value pair with timestamp.

// Input
{ "key": "current_task" }

// Output
{
"key": "current_task",
"value": "Analyzing Q4 revenue data for the board presentation",
"updated_at": "2025-12-15T14:20:00Z"
}

set_agent_state

Set or update a named state value for the current agent. Values can be any JSON type (string, number, boolean, object, array).

Parameters:

NameTypeRequiredDescription
keystringYesState key name
valueanyYesState value (any valid JSON)

Returns: Confirmation.

// Input
{ "key": "current_task", "value": { "name": "Generating the final report", "progress": 0.75, "sections_done": ["intro", "analysis"] } }

// Output
{ "status": "ok", "key": "current_task" }

delete_memory

Delete a specific memory entry by ID. This operation removes both the metadata record and the vector embedding. Irreversible.

Parameters:

NameTypeRequiredDescription
memory_idstringYesUUID of the memory to delete

Returns: Deletion confirmation.

// Input
{ "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" }

// Output
{ "status": "deleted", "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" }

Chain-of-Thought (3)

Tools for recording and replaying agent reasoning traces. Steps are cryptographic hash-chained per session for tamper detection. See Chain-of-Thought Ledger for architecture details.


log_reasoning_step

Log a single step in an agent's reasoning chain. The proxy auto-manages session IDs — each agent connection maintains its own session, so there is no need to pass a session_id parameter.

Parameters:

NameTypeRequiredDescription
step_typestringYesOne of: observation, hypothesis, decision, action, reflection
contentstringYesContent of the reasoning step
confidencefloatNoConfidence score for this step (0.0–1.0)
evidencestring[]NoSupporting evidence or references for this step

Returns: Step confirmation with hash chain data.

// Input
{
"step_type": "hypothesis",
"content": "Revenue is up 15% but concentrated in enterprise. Need to check SMB trends.",
"confidence": 0.85,
"evidence": ["Q4 revenue report showed 15% YoY growth", "Enterprise segment accounts for 78% of total"]
}

// Output
{
"trace_id": "f1e2d3c4-...",
"step_number": 3,
"hash": "a7b8c9d0e1f2...",
"prev_hash": "1234abcd5678...",
"session_id": "auto-session-a1b2c3d4"
}

Steps with step_type of decision are always embedded for semantic search, regardless of the configured sampling rate.


replay_decision

Replay the full reasoning chain for a session, including cryptographic hash chain verification.

Parameters:

NameTypeRequiredDescription
session_idstringYesSession identifier to replay
verify_chainbooleanNoVerify hash chain integrity (default: true)

Returns: Array of reasoning steps with chain verification status.

// Input
{ "session_id": "analysis-2025-12-15", "verify_chain": true }

// Output
{
"session_id": "analysis-2025-12-15",
"chain_valid": true,
"step_count": 8,
"steps": [
{
"step_number": 0,
"step_type": "observation",
"content": "User asked about Q4 revenue trends",
"hash": "1234abcd...",
"prev_hash": null,
"timestamp": "2025-12-15T10:00:00Z"
},
{
"step_number": 1,
"step_type": "planning",
"content": "Plan: 1) Query revenue by segment 2) Compare Q3 vs Q4 3) Identify drivers",
"hash": "5678efgh...",
"prev_hash": "1234abcd...",
"timestamp": "2025-12-15T10:00:01Z"
}
]
}

If chain_valid is false, the response indicates which step has an invalid hash.


get_session_history

List reasoning sessions for the current agent with optional filtering.

Parameters:

NameTypeRequiredDescription
limitintegerNoMaximum sessions to return (default: 20)
agent_idstringNoFilter by agent (default: current agent)
sincestringNoISO 8601 timestamp — only sessions after this time

Returns: Array of session summaries.

// Input
{ "limit": 5 }

// Output
[
{
"session_id": "analysis-2025-12-15",
"step_count": 8,
"first_step_at": "2025-12-15T10:00:00Z",
"last_step_at": "2025-12-15T10:05:30Z",
"step_types": ["observation", "planning", "tool_call", "tool_result", "thought", "decision", "summary"]
},
{
"session_id": "report-2025-12-14",
"step_count": 12,
"first_step_at": "2025-12-14T15:30:00Z",
"last_step_at": "2025-12-14T15:42:00Z",
"step_types": ["observation", "thought", "action", "result", "reflection", "summary"]
}
]

Triggers (4)

Tools for managing semantic triggers — event-driven automations that fire when content matches a concept via cosine similarity. Evaluation uses a two-stage pipeline: vector ANN pre-filter (top-K) followed by exact cosine verification. See Semantic Triggers for architecture details.


register_trigger

Register a new semantic trigger.

Parameters:

NameTypeRequiredDescription
namestringYesHuman-readable trigger name
conceptstringYesNatural language concept to match against
thresholdfloatNoCosine similarity threshold (default: 0.8)
actionobjectYesAction to take when triggered
cooldown_secondsintegerNoMinimum seconds between firings (default: 60)

Action types:

TypeRequired FieldsDescription
webhookurl, headersHMAC-SHA256 signed POST to URL
agent_notifyagent_id, messageAdd to agent's offline notification inbox
write_eventevent_type, payloadWrite a structured event to the audit log
flag_for_reviewreason, priorityFlag content for human review in the dashboard

Returns: Trigger registration confirmation.

// Input
{
"name": "fraud-pattern-detector",
"concept": "unusual transaction pattern indicative of fraud or money laundering",
"threshold": 0.85,
"action": {
"type": "webhook",
"url": "https://alerts.example.com/fraud",
"headers": { "X-Source": "hatidata-trigger" }
},
"cooldown_seconds": 300
}

// Output
{
"trigger_id": "trig_a1b2c3d4",
"name": "fraud-pattern-detector",
"status": "active"
}

list_triggers

List all registered triggers for the current organization.

Parameters:

NameTypeRequiredDescription
statusstringNoactive, paused, or all (default: active)

Returns: Array of trigger objects.

// Input
{ "status": "active" }

// Output
[
{
"trigger_id": "trig_a1b2c3d4",
"name": "fraud-pattern-detector",
"concept": "unusual transaction pattern indicative of fraud or money laundering",
"threshold": 0.85,
"action": { "type": "webhook", "url": "https://alerts.example.com/fraud" },
"cooldown_seconds": 300,
"status": "active",
"last_fired_at": "2025-12-14T08:30:00Z",
"fire_count": 7
}
]

delete_trigger

Delete a registered trigger.

Parameters:

NameTypeRequiredDescription
trigger_idstringYesID of the trigger to delete

Returns: Deletion confirmation.

// Input
{ "trigger_id": "trig_a1b2c3d4" }

// Output
{ "status": "deleted", "trigger_id": "trig_a1b2c3d4" }

test_trigger

Test a trigger against sample content without firing the action. Useful for threshold calibration.

Parameters:

NameTypeRequiredDescription
trigger_idstringYesID of the trigger to test
contentstringYesSample content to evaluate

Returns: Evaluation result with similarity score.

// Input
{
"trigger_id": "trig_a1b2c3d4",
"content": "Multiple high-value wire transfers to newly created offshore accounts within 24 hours"
}

// Output
{
"trigger_id": "trig_a1b2c3d4",
"would_fire": true,
"similarity_score": 0.92,
"threshold": 0.85,
"action_type": "webhook"
}

Branching (5)

Tools for creating isolated data branches for speculative agent operations. Branches use schema-based isolation with zero-copy views on creation and copy-on-write materialization on first write. See Branch Isolation for architecture details.


branch_create

Create a new isolated branch from the main data layer.

Parameters:

NameTypeRequiredDescription
namestringYesHuman-readable branch name
descriptionstringNoPurpose of the branch
ttl_hoursintegerNoAuto-expire after N hours (default: 24)

Returns: Branch creation confirmation.

// Input
{
"name": "what-if-pricing-change",
"description": "Simulate the effect of a 10% price increase on enterprise tier",
"ttl_hours": 48
}

// Output
{
"branch_id": "br_a1b2c3d4",
"name": "what-if-pricing-change",
"schema": "branch_a1b2c3d4",
"status": "active",
"expires_at": "2025-12-17T10:00:00Z"
}

branch_query

Execute a SQL query within a branch's isolated schema. The first write to a table in a branch materializes a copy from main; subsequent operations see modified data without affecting the main schema.

Parameters:

NameTypeRequiredDescription
branch_idstringYesBranch to query
sqlstringYesSQL query to execute

Returns: Array of row objects (same format as query).

// Input
{
"branch_id": "br_a1b2c3d4",
"sql": "UPDATE products SET price = price * 1.10 WHERE tier = 'enterprise'; SELECT tier, AVG(price) AS avg_price FROM products GROUP BY tier;"
}

// Output
[
{ "tier": "enterprise", "avg_price": 1100.00 },
{ "tier": "growth", "avg_price": 500.00 },
{ "tier": "free", "avg_price": 0.00 }
]

branch_merge

Merge a branch's changes back into the main schema.

Parameters:

NameTypeRequiredDescription
branch_idstringYesBranch to merge
strategystringNoConflict resolution: ours (branch changes win), theirs (main data wins), or fail_on_conflict (abort if conflicts detected, default)

Returns: Merge result with conflict details.

// Input
{ "branch_id": "br_a1b2c3d4", "strategy": "ours" }

// Output
{
"status": "merged",
"branch_id": "br_a1b2c3d4",
"tables_merged": ["products"],
"conflicts": 0
}

If conflicts are detected and the strategy is fail_on_conflict, no changes are applied and the conflicts are returned for review.

SDK aliases

The SDK uses different names for the same strategies: branch_wins (= ours), main_wins (= theirs), abort (= fail_on_conflict). Both naming conventions are accepted by the proxy.


branch_discard

Discard a branch and clean up its isolated schema.

Parameters:

NameTypeRequiredDescription
branch_idstringYesBranch to discard

Returns: Discard confirmation.

// Input
{ "branch_id": "br_a1b2c3d4" }

// Output
{ "status": "discarded", "branch_id": "br_a1b2c3d4" }

branch_list

List all branches for the current agent or organization.

Parameters:

NameTypeRequiredDescription
statusstringNoactive, merged, discarded, or all (default: active)

Returns: Array of branch summaries.

// Input
{ "status": "active" }

// Output
[
{
"branch_id": "br_a1b2c3d4",
"name": "what-if-pricing-change",
"description": "Simulate the effect of a 10% price increase on enterprise tier",
"status": "active",
"materialized_tables": ["products"],
"created_at": "2025-12-15T10:00:00Z",
"expires_at": "2025-12-15T12:00:00Z"
}
]

Identity (1)

Tools for agent self-registration. See Agent Registry for the full registration model.


register_agent

Register a new agent identity. Returns a stable fingerprint (hd_agt_xxx) and initial trust level. Most agents register automatically on first connection — this tool is for MCP-native agents (Claude, Cursor, AutoGen) that want to explicitly identify themselves.

Parameters:

NameTypeRequiredDescription
namestringYesDisplay name for the agent
frameworkstringNoAgent framework (default: "custom")
scopesstring[]NoRequested scopes (default: ["agent"])
session_boundbooleanNoEphemeral session-only agent (default: false)

Returns: Registration confirmation with fingerprint and trust level.

// Input
{
"name": "ResearchAssistant",
"framework": "mcp",
"scopes": ["query:read", "memory:read", "memory:write"],
"session_bound": true
}

// Output
{
"fingerprint": "hd_agt_c8e5f2a3",
"trust_level": "provisional",
"expires_at": "session_end"
}

When session_bound is true, the agent identity is ephemeral and expires when the MCP session ends. The fingerprint is not reusable across sessions.


spawn_agent (coming soon)

Register a sub-agent on behalf of the current agent. Requires Orchestrator trust level. The parent-child relationship is stored in the registry for lineage tracking.

Parameters:

NameTypeRequiredDescription
namestringYesDisplay name for the sub-agent
frameworkstringNoAgent framework (default: "custom")
scopesstring[]NoScopes for the sub-agent (must be subset of parent's scopes)
allowed_tablesstring[]NoTable allowlist for the sub-agent
max_queries_hrintegerNoQuery rate limit per hour

Returns: Sub-agent fingerprint with parent lineage.

// Input
{
"name": "DataValidator",
"framework": "custom",
"scopes": ["query:read", "memory:write"],
"allowed_tables": ["agent_memories"],
"max_queries_hr": 100
}

// Output
{
"fingerprint": "hd_agt_b7d4e2f1",
"trust_level": "provisional",
"parent_fingerprint": "hd_agt_orch_01"
}

Error Handling

All tools return errors in a consistent format:

{
"error": {
"code": "POLICY_VIOLATION",
"message": "Agent 'analyst' is not permitted to execute DDL statements",
"request_id": "req_a1b2c3d4"
}
}
CodeDescription
QUERY_ERRORSQL syntax error or execution failure
POLICY_VIOLATIONABAC policy blocked the operation
QUOTA_EXCEEDEDAgent has exceeded its credit quota
NOT_FOUNDRequested resource (memory, branch, trigger) does not exist
BRANCH_EXPIREDBranch TTL has expired and the schema has been cleaned up
CHAIN_INVALIDCoT hash chain verification failed — chain may have been tampered with

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.