MCP Tools Reference
HatiData's MCP server exposes 25 tools in six categories: Core Query (7), Memory (5), Chain-of-Thought (3), Triggers (4), Branching (5), and Identity (1). Every tool call passes through the full security pipeline — authentication, ABAC policy evaluation, audit logging, and metering.
HatiData ships two distinct MCP server implementations:
- Proxy MCP Server (port 5440, HTTP/SSE): Exposes all 25 tools listed on this page. Used by agents connecting directly to HatiData's data plane. Supports the full security pipeline including ABAC, row-level security, column masking, metering, and audit logging.
- SDK MCP Server (
hatidata-mcp-serverCLI): Exposes 4 basic tools (query,list_tables,describe_table,get_context) via stdio transport. Used with Claude Desktop, Cursor, and Claude Code. See MCP Setup for configuration.
The tool reference below documents the Proxy MCP Server tools.
For MCP server setup, see MCP Setup.
Core Query (7)
Tools for executing SQL and exploring schemas within the data layer.
run_sql
Execute DDL, DML, or DQL SQL statements. Supports both standard SQL and Snowflake-compatible syntax (NVL, IFF, DATEDIFF, VARIANT, etc.), which is auto-transpiled. Passes through the full multi-stage pipeline: policy evaluation, cost estimation, quota checks, row-level security, transpilation, execution, AI healing on failure, column masking, and audit logging.
Scope: Requires QueryWrite scope for DDL/DML statements, QueryRead for SELECT queries.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
sql | string | Yes | SQL statement to execute |
timeout_ms | integer | No | Query timeout in milliseconds (default: 30000) |
Returns: Array of row objects (for queries) or execution confirmation (for DDL/DML).
// Input
{ "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 5", "timeout_ms": 15000 }
// Output
[
{ "customer_id": 42, "revenue": 125000.00 },
{ "customer_id": 17, "revenue": 98500.00 },
{ "customer_id": 8, "revenue": 87200.00 }
]
run_sql_readonly
Execute SELECT-only queries. Any DDL or DML statements are rejected. Useful for agents that should only read data without modifying it.
Scope: Requires QueryRead scope.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
sql | string | Yes | SELECT query to execute |
max_rows | integer | No | Maximum rows to return (default: 1000) |
timeout_ms | integer | No | Query timeout in milliseconds (default: 30000) |
Returns: Array of row objects.
// Input
{ "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY 1 ORDER BY 2 DESC", "max_rows": 5 }
// Output
[
{ "customer_id": 42, "revenue": 125000.00 },
{ "customer_id": 17, "revenue": 98500.00 },
{ "customer_id": 8, "revenue": 87200.00 }
]
list_schemas
Enumerate all schemas the authenticated agent has permission to access. Results are filtered by ABAC policies.
Scope: Requires QueryRead scope.
Parameters: None.
Returns: Array of schema name strings.
// Input
{}
// Output
["main", "analytics", "staging"]
list_tables
List all tables within a specific schema that the authenticated agent has permission to access. Results are filtered by ABAC policies — agents only see tables they are authorized to query.
Scope: Requires QueryRead scope.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
schema | string | No | Schema to list tables from (default: "main") |
Returns: Array of table name strings.
// Input
{ "schema": "main" }
// Output
["customers", "orders", "products", "events", "knowledge_base"]
describe_table
Get column names, data types, and nullability for a table. Queries information_schema.columns. Columns masked by policy are excluded from the result. Supports schema.table format for tables outside the default schema.
Scope: Requires QueryRead scope.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
table | string | Yes | Name of the table to describe (supports schema.table format) |
Returns: Array of column descriptor objects.
// Input
{ "table": "orders" }
// Output
[
{ "column_name": "id", "data_type": "INTEGER", "is_nullable": "NO" },
{ "column_name": "customer_id", "data_type": "INTEGER", "is_nullable": "NO" },
{ "column_name": "total", "data_type": "DECIMAL(10,2)", "is_nullable": "YES" },
{ "column_name": "status", "data_type": "VARCHAR", "is_nullable": "YES" },
{ "column_name": "created_at", "data_type": "TIMESTAMP", "is_nullable": "YES" }
]
run_sql_arrow
Execute a read-only SQL query and return results in Apache Arrow IPC format (base64-encoded). Designed for high-performance data transfer to agents that can process Arrow natively.
Scope: Requires QueryRead scope.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
sql | string | Yes | SELECT query to execute |
max_rows | integer | No | Maximum rows to return (default: 10000) |
Returns: Base64-encoded Apache Arrow IPC stream.
// Input
{ "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY 1", "max_rows": 5000 }
// Output
{
"format": "arrow-ipc",
"data": "QVJST1cxAA...",
"row_count": 1250,
"schema": [
{ "name": "customer_id", "type": "Int32" },
{ "name": "revenue", "type": "Decimal128(10,2)" }
]
}
get_usage_stats
Retrieve usage statistics for the current agent, including query counts, memory writes, branch operations, and resource consumption.
Scope: Requires QueryRead scope.
Parameters: None.
Returns: Usage statistics object.
// Input
{}
// Output
{
"queries_executed": 142,
"memory_writes": 37,
"memory_reads": 85,
"branches_created": 3,
"resources_used_mb": 256.5,
"period_start": "2025-12-01T00:00:00Z",
"period_end": "2025-12-15T23:59:59Z"
}
Memory (5)
Tools for storing, retrieving, and managing long-term agent memory. Memory uses a hybrid architecture: vector ANN search for semantic recall, with structured metadata filtering and storage. See Persistent Memory for architecture details.
store_memory
Store a new memory entry with optional tags, metadata, and importance scoring. Embedding is generated asynchronously after storage.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
content | string | Yes | Memory content |
tags | string[] | No | Tags for categorical filtering |
metadata | object | No | Arbitrary key-value metadata |
importance | float | No | Importance score 0.0–1.0 (default: 0.5) |
Returns: Memory creation confirmation with ID.
// Input
{
"content": "Enterprise customers have a 95% renewal rate in Q4",
"tags": ["revenue", "enterprise", "retention"],
"metadata": { "source": "quarterly-review", "quarter": "Q4-2025" },
"importance": 0.9
}
// Output
{
"memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "stored",
"has_embedding": false
}
has_embedding becomes true once the async embedding worker processes the entry (typically within seconds).
search_memory
Search memories using natural language queries with optional tag filters. Uses vector ANN for semantic search, then joins with structured metadata for filtering. Falls back to heuristic scoring if the vector index is unavailable.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Natural language search query |
limit | integer | No | Maximum results (default: 10) |
min_score | float | No | Minimum similarity score threshold (default: 0.7) |
tags | string[] | No | Filter results to memories with any of these tags |
Returns: Array of memory objects ranked by similarity score.
// Input
{ "query": "customer renewal rates", "limit": 3, "tags": ["revenue", "retention"] }
// Output
[
{
"memory_id": "a1b2c3d4-...",
"content": "Enterprise customers have a 95% renewal rate in Q4",
"tags": ["revenue", "enterprise", "retention"],
"importance": 0.9,
"similarity_score": 0.94,
"metadata": { "source": "quarterly-review" },
"created_at": "2025-12-15T10:30:00Z"
}
]
get_agent_state
Retrieve a named state value for the current agent. State is stored in _hatidata_agent_state, scoped by agent_id.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
key | string | Yes | State key name |
Returns: State key-value pair with timestamp.
// Input
{ "key": "current_task" }
// Output
{
"key": "current_task",
"value": "Analyzing Q4 revenue data for the board presentation",
"updated_at": "2025-12-15T14:20:00Z"
}
set_agent_state
Set or update a named state value for the current agent. Values can be any JSON type (string, number, boolean, object, array).
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
key | string | Yes | State key name |
value | any | Yes | State value (any valid JSON) |
Returns: Confirmation.
// Input
{ "key": "current_task", "value": { "name": "Generating the final report", "progress": 0.75, "sections_done": ["intro", "analysis"] } }
// Output
{ "status": "ok", "key": "current_task" }
delete_memory
Delete a specific memory entry by ID. This operation removes both the metadata record and the vector embedding. Irreversible.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
memory_id | string | Yes | UUID of the memory to delete |
Returns: Deletion confirmation.
// Input
{ "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" }
// Output
{ "status": "deleted", "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" }
Chain-of-Thought (3)
Tools for recording and replaying agent reasoning traces. Steps are cryptographic hash-chained per session for tamper detection. See Chain-of-Thought Ledger for architecture details.
log_reasoning_step
Log a single step in an agent's reasoning chain. The proxy auto-manages session IDs — each agent connection maintains its own session, so there is no need to pass a session_id parameter.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
step_type | string | Yes | One of: observation, hypothesis, decision, action, reflection |
content | string | Yes | Content of the reasoning step |
confidence | float | No | Confidence score for this step (0.0–1.0) |
evidence | string[] | No | Supporting evidence or references for this step |
Returns: Step confirmation with hash chain data.
// Input
{
"step_type": "hypothesis",
"content": "Revenue is up 15% but concentrated in enterprise. Need to check SMB trends.",
"confidence": 0.85,
"evidence": ["Q4 revenue report showed 15% YoY growth", "Enterprise segment accounts for 78% of total"]
}
// Output
{
"trace_id": "f1e2d3c4-...",
"step_number": 3,
"hash": "a7b8c9d0e1f2...",
"prev_hash": "1234abcd5678...",
"session_id": "auto-session-a1b2c3d4"
}
Steps with step_type of decision are always embedded for semantic search, regardless of the configured sampling rate.
replay_decision
Replay the full reasoning chain for a session, including cryptographic hash chain verification.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
session_id | string | Yes | Session identifier to replay |
verify_chain | boolean | No | Verify hash chain integrity (default: true) |
Returns: Array of reasoning steps with chain verification status.
// Input
{ "session_id": "analysis-2025-12-15", "verify_chain": true }
// Output
{
"session_id": "analysis-2025-12-15",
"chain_valid": true,
"step_count": 8,
"steps": [
{
"step_number": 0,
"step_type": "observation",
"content": "User asked about Q4 revenue trends",
"hash": "1234abcd...",
"prev_hash": null,
"timestamp": "2025-12-15T10:00:00Z"
},
{
"step_number": 1,
"step_type": "planning",
"content": "Plan: 1) Query revenue by segment 2) Compare Q3 vs Q4 3) Identify drivers",
"hash": "5678efgh...",
"prev_hash": "1234abcd...",
"timestamp": "2025-12-15T10:00:01Z"
}
]
}
If chain_valid is false, the response indicates which step has an invalid hash.
get_session_history
List reasoning sessions for the current agent with optional filtering.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
limit | integer | No | Maximum sessions to return (default: 20) |
agent_id | string | No | Filter by agent (default: current agent) |
since | string | No | ISO 8601 timestamp — only sessions after this time |
Returns: Array of session summaries.
// Input
{ "limit": 5 }
// Output
[
{
"session_id": "analysis-2025-12-15",
"step_count": 8,
"first_step_at": "2025-12-15T10:00:00Z",
"last_step_at": "2025-12-15T10:05:30Z",
"step_types": ["observation", "planning", "tool_call", "tool_result", "thought", "decision", "summary"]
},
{
"session_id": "report-2025-12-14",
"step_count": 12,
"first_step_at": "2025-12-14T15:30:00Z",
"last_step_at": "2025-12-14T15:42:00Z",
"step_types": ["observation", "thought", "action", "result", "reflection", "summary"]
}
]
Triggers (4)
Tools for managing semantic triggers — event-driven automations that fire when content matches a concept via cosine similarity. Evaluation uses a two-stage pipeline: vector ANN pre-filter (top-K) followed by exact cosine verification. See Semantic Triggers for architecture details.
register_trigger
Register a new semantic trigger.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Human-readable trigger name |
concept | string | Yes | Natural language concept to match against |
threshold | float | No | Cosine similarity threshold (default: 0.8) |
action | object | Yes | Action to take when triggered |
cooldown_seconds | integer | No | Minimum seconds between firings (default: 60) |
Action types:
| Type | Required Fields | Description |
|---|---|---|
webhook | url, headers | HMAC-SHA256 signed POST to URL |
agent_notify | agent_id, message | Add to agent's offline notification inbox |
write_event | event_type, payload | Write a structured event to the audit log |
flag_for_review | reason, priority | Flag content for human review in the dashboard |
Returns: Trigger registration confirmation.
// Input
{
"name": "fraud-pattern-detector",
"concept": "unusual transaction pattern indicative of fraud or money laundering",
"threshold": 0.85,
"action": {
"type": "webhook",
"url": "https://alerts.example.com/fraud",
"headers": { "X-Source": "hatidata-trigger" }
},
"cooldown_seconds": 300
}
// Output
{
"trigger_id": "trig_a1b2c3d4",
"name": "fraud-pattern-detector",
"status": "active"
}
list_triggers
List all registered triggers for the current organization.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
status | string | No | active, paused, or all (default: active) |
Returns: Array of trigger objects.
// Input
{ "status": "active" }
// Output
[
{
"trigger_id": "trig_a1b2c3d4",
"name": "fraud-pattern-detector",
"concept": "unusual transaction pattern indicative of fraud or money laundering",
"threshold": 0.85,
"action": { "type": "webhook", "url": "https://alerts.example.com/fraud" },
"cooldown_seconds": 300,
"status": "active",
"last_fired_at": "2025-12-14T08:30:00Z",
"fire_count": 7
}
]
delete_trigger
Delete a registered trigger.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
trigger_id | string | Yes | ID of the trigger to delete |
Returns: Deletion confirmation.
// Input
{ "trigger_id": "trig_a1b2c3d4" }
// Output
{ "status": "deleted", "trigger_id": "trig_a1b2c3d4" }
test_trigger
Test a trigger against sample content without firing the action. Useful for threshold calibration.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
trigger_id | string | Yes | ID of the trigger to test |
content | string | Yes | Sample content to evaluate |
Returns: Evaluation result with similarity score.
// Input
{
"trigger_id": "trig_a1b2c3d4",
"content": "Multiple high-value wire transfers to newly created offshore accounts within 24 hours"
}
// Output
{
"trigger_id": "trig_a1b2c3d4",
"would_fire": true,
"similarity_score": 0.92,
"threshold": 0.85,
"action_type": "webhook"
}
Branching (5)
Tools for creating isolated data branches for speculative agent operations. Branches use schema-based isolation with zero-copy views on creation and copy-on-write materialization on first write. See Branch Isolation for architecture details.
branch_create
Create a new isolated branch from the main data layer.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Human-readable branch name |
description | string | No | Purpose of the branch |
ttl_hours | integer | No | Auto-expire after N hours (default: 24) |
Returns: Branch creation confirmation.
// Input
{
"name": "what-if-pricing-change",
"description": "Simulate the effect of a 10% price increase on enterprise tier",
"ttl_hours": 48
}
// Output
{
"branch_id": "br_a1b2c3d4",
"name": "what-if-pricing-change",
"schema": "branch_a1b2c3d4",
"status": "active",
"expires_at": "2025-12-17T10:00:00Z"
}
branch_query
Execute a SQL query within a branch's isolated schema. The first write to a table in a branch materializes a copy from main; subsequent operations see modified data without affecting the main schema.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
branch_id | string | Yes | Branch to query |
sql | string | Yes | SQL query to execute |
Returns: Array of row objects (same format as query).
// Input
{
"branch_id": "br_a1b2c3d4",
"sql": "UPDATE products SET price = price * 1.10 WHERE tier = 'enterprise'; SELECT tier, AVG(price) AS avg_price FROM products GROUP BY tier;"
}
// Output
[
{ "tier": "enterprise", "avg_price": 1100.00 },
{ "tier": "growth", "avg_price": 500.00 },
{ "tier": "free", "avg_price": 0.00 }
]
branch_merge
Merge a branch's changes back into the main schema.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
branch_id | string | Yes | Branch to merge |
strategy | string | No | Conflict resolution: ours (branch changes win), theirs (main data wins), or fail_on_conflict (abort if conflicts detected, default) |
Returns: Merge result with conflict details.
// Input
{ "branch_id": "br_a1b2c3d4", "strategy": "ours" }
// Output
{
"status": "merged",
"branch_id": "br_a1b2c3d4",
"tables_merged": ["products"],
"conflicts": 0
}
If conflicts are detected and the strategy is fail_on_conflict, no changes are applied and the conflicts are returned for review.
The SDK uses different names for the same strategies: branch_wins (= ours), main_wins (= theirs), abort (= fail_on_conflict). Both naming conventions are accepted by the proxy.
branch_discard
Discard a branch and clean up its isolated schema.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
branch_id | string | Yes | Branch to discard |
Returns: Discard confirmation.
// Input
{ "branch_id": "br_a1b2c3d4" }
// Output
{ "status": "discarded", "branch_id": "br_a1b2c3d4" }
branch_list
List all branches for the current agent or organization.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
status | string | No | active, merged, discarded, or all (default: active) |
Returns: Array of branch summaries.
// Input
{ "status": "active" }
// Output
[
{
"branch_id": "br_a1b2c3d4",
"name": "what-if-pricing-change",
"description": "Simulate the effect of a 10% price increase on enterprise tier",
"status": "active",
"materialized_tables": ["products"],
"created_at": "2025-12-15T10:00:00Z",
"expires_at": "2025-12-15T12:00:00Z"
}
]
Identity (1)
Tools for agent self-registration. See Agent Registry for the full registration model.
register_agent
Register a new agent identity. Returns a stable fingerprint (hd_agt_xxx) and initial trust level. Most agents register automatically on first connection — this tool is for MCP-native agents (Claude, Cursor, AutoGen) that want to explicitly identify themselves.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Display name for the agent |
framework | string | No | Agent framework (default: "custom") |
scopes | string[] | No | Requested scopes (default: ["agent"]) |
session_bound | boolean | No | Ephemeral session-only agent (default: false) |
Returns: Registration confirmation with fingerprint and trust level.
// Input
{
"name": "ResearchAssistant",
"framework": "mcp",
"scopes": ["query:read", "memory:read", "memory:write"],
"session_bound": true
}
// Output
{
"fingerprint": "hd_agt_c8e5f2a3",
"trust_level": "provisional",
"expires_at": "session_end"
}
When session_bound is true, the agent identity is ephemeral and expires when the MCP session ends. The fingerprint is not reusable across sessions.
spawn_agent (coming soon)
Register a sub-agent on behalf of the current agent. Requires Orchestrator trust level. The parent-child relationship is stored in the registry for lineage tracking.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Display name for the sub-agent |
framework | string | No | Agent framework (default: "custom") |
scopes | string[] | No | Scopes for the sub-agent (must be subset of parent's scopes) |
allowed_tables | string[] | No | Table allowlist for the sub-agent |
max_queries_hr | integer | No | Query rate limit per hour |
Returns: Sub-agent fingerprint with parent lineage.
// Input
{
"name": "DataValidator",
"framework": "custom",
"scopes": ["query:read", "memory:write"],
"allowed_tables": ["agent_memories"],
"max_queries_hr": 100
}
// Output
{
"fingerprint": "hd_agt_b7d4e2f1",
"trust_level": "provisional",
"parent_fingerprint": "hd_agt_orch_01"
}
Error Handling
All tools return errors in a consistent format:
{
"error": {
"code": "POLICY_VIOLATION",
"message": "Agent 'analyst' is not permitted to execute DDL statements",
"request_id": "req_a1b2c3d4"
}
}
| Code | Description |
|---|---|
QUERY_ERROR | SQL syntax error or execution failure |
POLICY_VIOLATION | ABAC policy blocked the operation |
QUOTA_EXCEEDED | Agent has exceeded its credit quota |
NOT_FOUND | Requested resource (memory, branch, trigger) does not exist |
BRANCH_EXPIRED | Branch TTL has expired and the schema has been cleaned up |
CHAIN_INVALID | CoT hash chain verification failed — chain may have been tampered with |