MCP Tools Reference
HatiData's MCP server exposes 24 tools organized into five categories: SQL, Memory, Chain-of-Thought, Triggers, and Branches. Every tool call goes through HatiData's full security pipeline (authentication, ABAC policy evaluation, audit logging, metering).
For MCP server setup and configuration, see MCP Server.
SQL Tools (7)
Tools for querying the data warehouse and exploring schemas.
query
Execute a SQL query against the warehouse and return results as JSON.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
sql | string | Yes | SQL query to execute |
Returns: Array of row objects.
Example:
// Input
{
"sql": "SELECT customer_id, SUM(total) as revenue FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 5"
}
// Output
[
{"customer_id": 42, "revenue": 125000.00},
{"customer_id": 17, "revenue": 98500.00},
{"customer_id": 8, "revenue": 87200.00},
{"customer_id": 103, "revenue": 76400.00},
{"customer_id": 55, "revenue": 71100.00}
]
Supports both standard SQL and legacy warehouse SQL syntax (NVL, IFF, DATEDIFF, etc.), which is auto-transpiled. Queries pass through the full 13-step pipeline: policy evaluation, cost estimation, quota checks, row-level security, transpilation, DuckDB execution, AI healing on failure, column masking, and audit logging.
list_tables
List all tables the authenticated agent has permission to access.
Parameters: None.
Returns: Array of table name strings.
Example:
// Input
{}
// Output
["customers", "orders", "products", "events", "knowledge_base"]
Results are filtered by ABAC policies -- agents only see tables they are authorized to access.
describe_table
Get column names, data types, and nullability for a specific table.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
table_name | string | Yes | Name of the table to describe |
Returns: Array of column descriptor objects.
Example:
// Input
{"table_name": "orders"}
// Output
[
{"column_name": "id", "data_type": "INTEGER", "is_nullable": "NO"},
{"column_name": "customer_id", "data_type": "INTEGER", "is_nullable": "NO"},
{"column_name": "total", "data_type": "DECIMAL(10,2)", "is_nullable": "YES"},
{"column_name": "status", "data_type": "VARCHAR", "is_nullable": "YES"},
{"column_name": "created_at", "data_type": "TIMESTAMP", "is_nullable": "YES"}
]
Queries information_schema.columns under the hood. Columns masked by policy are excluded from the result.
list_schemas
List all schemas in the database.
Parameters: None.
Returns: Array of schema name strings.
Example:
// Input
{}
// Output
["main", "staging", "analytics"]
read_query
Execute a read-only SQL query against the warehouse. Identical to query but enforces that the statement is a SELECT (no INSERT, UPDATE, DELETE, or DDL). Useful for agents that should only have read access.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
sql | string | Yes | SELECT query to execute |
Returns: Array of row objects.
Example:
// Input
{"sql": "SELECT segment, COUNT(*) as customer_count FROM customers GROUP BY segment"}
// Output
[
{"segment": "enterprise", "customer_count": 142},
{"segment": "mid_market", "customer_count": 891},
{"segment": "smb", "customer_count": 3204}
]
Non-SELECT statements are rejected with a POLICY_VIOLATION error. This tool is intended for agents configured with read-only access.
get_usage_stats
Get query usage statistics and credit consumption for the current agent.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
period | string | No | Time period: "today", "week", "month" (default: "month") |
Returns: Usage summary object.
Example:
// Input
{"period": "month"}
// Output
{
"agent_id": "analyst-agent",
"period": "2025-12",
"query_count": 1247,
"credits_used": 3891.5,
"credits_remaining": 96108.5,
"avg_duration_ms": 38,
"tables_accessed": ["customers", "orders", "products", "events"]
}
Returns usage metrics scoped to the authenticated agent. Organization-level usage requires audit:read scope.
run_sql_arrow
Execute a read-only SQL query and return results as base64-encoded Apache Arrow IPC. Designed for efficient data transfer to pandas, polars, and other columnar data frameworks. Requires QueryRead scope.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
sql | string | Yes | SELECT query to execute |
max_rows | integer | No | Maximum rows to return (default: 10000) |
Returns: Object containing base64-encoded Arrow IPC stream data.
Example:
// Input
{
"sql": "SELECT customer_id, SUM(total) as revenue, COUNT(*) as order_count FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 1000"
}
// Output
{
"format": "arrow_ipc_base64",
"data": "QVJST1cxAAAAAAoAAAAAAAD...",
"rows": 1000,
"execution_time_ms": 12
}
The data field contains a base64-encoded Apache Arrow IPC stream. Decode and load directly into pandas or polars:
import base64
import pyarrow.ipc as ipc
import io
# Decode the Arrow IPC response
ipc_bytes = base64.b64decode(result["data"])
reader = ipc.open_stream(io.BytesIO(ipc_bytes))
table = reader.read_all()
# Convert to pandas or polars
df_pandas = table.to_pandas()
import polars as pl
df_polars = pl.from_arrow(table)
Arrow IPC preserves full type fidelity (integers, floats, timestamps, booleans, nulls) without the string parsing overhead of JSON results. This makes run_sql_arrow significantly faster than query for large result sets consumed by data analysis code.
For the HTTP endpoint equivalent, see Arrow Queries.
Memory Tools (5)
Tools for storing, searching, and managing long-term agent memory. See Agent Memory for architecture details.
store_memory
Store a new memory entry with optional metadata and importance scoring.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
content | string | Yes | The memory content (natural language or structured data) |
memory_type | string | No | Category: "fact", "episode", "preference", "context" (default: "fact") |
metadata | object | No | Arbitrary key-value metadata |
importance | float | No | Importance score from 0.0 to 1.0 (default: 0.5) |
Returns: Memory creation confirmation with ID.
Example:
// Input
{
"content": "Enterprise customers have a 95% renewal rate in Q4",
"memory_type": "fact",
"metadata": {
"source": "quarterly-review",
"quarter": "Q4-2025"
},
"importance": 0.9
}
// Output
{
"memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "stored",
"has_embedding": false
}
The has_embedding field becomes true once the async embedding worker processes the entry (typically within seconds).
search_memory
Search memories using natural language or keyword queries with optional filters.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Natural language search query |
top_k | integer | No | Maximum results to return (default: 10) |
memory_type | string | No | Filter by memory type |
min_importance | float | No | Minimum importance score filter |
agent_id | string | No | Filter by agent (default: current agent) |
Returns: Array of memory objects ranked by similarity.
Example:
// Input
{
"query": "customer renewal rates",
"top_k": 5,
"memory_type": "fact",
"min_importance": 0.5
}
// Output
[
{
"memory_id": "a1b2c3d4-...",
"content": "Enterprise customers have a 95% renewal rate in Q4",
"memory_type": "fact",
"importance": 0.9,
"similarity_score": 0.94,
"metadata": {"source": "quarterly-review", "quarter": "Q4-2025"},
"created_at": "2025-12-15T10:30:00Z"
},
{
"memory_id": "b2c3d4e5-...",
"content": "SMB segment renewal rate dropped to 72% in Q3",
"memory_type": "fact",
"importance": 0.8,
"similarity_score": 0.87,
"metadata": {"source": "churn-analysis"},
"created_at": "2025-10-01T14:20:00Z"
}
]
When vector search is available, similarity_score is cosine similarity. In fallback mode, it is a heuristic relevance score.
get_agent_state
Retrieve a named state value for the current agent.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
key | string | Yes | State key name |
Returns: State key-value pair with timestamp.
Example:
// Input
{"key": "current_task"}
// Output
{
"key": "current_task",
"value": "Analyzing Q4 revenue data for the board presentation",
"updated_at": "2025-12-15T14:20:00Z"
}
Agent state is stored in the _hatidata_agent_state table, scoped by agent_id.
set_agent_state
Set or update a named state value for the current agent.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
key | string | Yes | State key name |
value | string | Yes | State value |
Returns: Confirmation.
Example:
// Input
{
"key": "current_task",
"value": "Generating the final report"
}
// Output
{"status": "ok", "key": "current_task"}
delete_memory
Delete a specific memory entry by ID. This operation is irreversible -- it removes both the metadata record and the vector embedding.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
memory_id | string | Yes | UUID of the memory to delete |
Returns: Deletion confirmation.
Example:
// Input
{"memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"}
// Output
{"status": "deleted", "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"}
Chain-of-Thought Tools (3)
Tools for recording and replaying agent reasoning traces. See Chain-of-Thought Ledger for architecture details.
log_reasoning_step
Log a single step in an agent's reasoning chain. Steps are hash-chained (SHA-256) per session for tamper detection.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
session_id | string | Yes | Session identifier for the reasoning chain |
step_type | string | Yes | One of: observation, thought, action, result, reflection, planning, tool_call, tool_result, decision, error, correction, summary |
content | string | Yes | Content of the reasoning step |
metadata | object | No | Additional context (tool names, confidence scores, etc.) |
importance | float | No | Step importance for embedding sampling (default: 0.5) |
Returns: Step confirmation with hash chain data.
Example:
// Input
{
"session_id": "analysis-2025-12-15",
"step_type": "thought",
"content": "Revenue is up 15% but concentrated in enterprise segment. Need to check SMB trends.",
"metadata": {"confidence": 0.85},
"importance": 0.7
}
// Output
{
"trace_id": "f1e2d3c4-...",
"step_number": 3,
"hash": "a7b8c9d0e1f2...",
"prev_hash": "1234abcd5678...",
"session_id": "analysis-2025-12-15"
}
Steps with step_type of decision, error, or correction are always embedded for semantic search regardless of the sampling rate.
replay_decision
Replay the full reasoning chain for a specific session, including hash verification.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
session_id | string | Yes | Session identifier to replay |
verify_chain | boolean | No | Verify SHA-256 hash chain integrity (default: true) |
Returns: Array of reasoning steps with chain verification status.
Example:
// Input
{"session_id": "analysis-2025-12-15", "verify_chain": true}
// Output
{
"session_id": "analysis-2025-12-15",
"chain_valid": true,
"step_count": 8,
"steps": [
{
"step_number": 0,
"step_type": "observation",
"content": "User asked about Q4 revenue trends",
"hash": "1234abcd...",
"prev_hash": null,
"timestamp": "2025-12-15T10:00:00Z"
},
{
"step_number": 1,
"step_type": "planning",
"content": "Plan: 1) Query revenue by segment 2) Compare Q3 vs Q4 3) Identify drivers",
"hash": "5678efgh...",
"prev_hash": "1234abcd...",
"timestamp": "2025-12-15T10:00:01Z"
}
]
}
If chain_valid is false, the hash chain has been tampered with. The response will indicate which step has an invalid hash.
get_session_history
List reasoning sessions for the current agent, with optional filtering.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
limit | integer | No | Maximum sessions to return (default: 20) |
agent_id | string | No | Filter by agent (default: current agent) |
since | string | No | ISO 8601 timestamp to filter sessions after |
Returns: Array of session summaries.
Example:
// Input
{"limit": 5}
// Output
[
{
"session_id": "analysis-2025-12-15",
"step_count": 8,
"first_step_at": "2025-12-15T10:00:00Z",
"last_step_at": "2025-12-15T10:05:30Z",
"step_types": ["observation", "planning", "tool_call", "tool_result", "thought", "decision", "summary"]
},
{
"session_id": "report-2025-12-14",
"step_count": 12,
"first_step_at": "2025-12-14T15:30:00Z",
"last_step_at": "2025-12-14T15:42:00Z",
"step_types": ["observation", "thought", "action", "result", "reflection", "summary"]
}
]
Trigger Tools (4)
Tools for managing semantic triggers -- event-driven automations that fire when content matches a concept. See Semantic Triggers for architecture details.
register_trigger
Register a new semantic trigger that fires when content exceeds a similarity threshold to a concept.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Human-readable trigger name |
concept | string | Yes | Natural language concept to match against |
threshold | float | No | Cosine similarity threshold (default: 0.8) |
action | object | Yes | Action to take when triggered |
cooldown_seconds | integer | No | Minimum seconds between firings (default: 60) |
Action types:
| Action Type | Fields | Description |
|---|---|---|
webhook | url, headers | Send HMAC-SHA256 signed POST to URL |
agent_notify | agent_id, message | Add to agent's offline notification inbox |
write_event | event_type, payload | Write a structured event to the audit log |
flag_for_review | reason, priority | Flag content for human review in the dashboard |
Returns: Trigger registration confirmation.
Example:
// Input
{
"name": "fraud-pattern-detector",
"concept": "unusual transaction pattern indicative of fraud or money laundering",
"threshold": 0.85,
"action": {
"type": "webhook",
"url": "https://alerts.example.com/fraud",
"headers": {"X-Source": "hatidata-trigger"}
},
"cooldown_seconds": 300
}
// Output
{
"trigger_id": "trig_a1b2c3d4",
"name": "fraud-pattern-detector",
"status": "active"
}
list_triggers
List all registered triggers for the current organization.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
status | string | No | Filter by status: "active", "paused", "all" (default: "active") |
Returns: Array of trigger objects.
Example:
// Input
{"status": "active"}
// Output
[
{
"trigger_id": "trig_a1b2c3d4",
"name": "fraud-pattern-detector",
"concept": "unusual transaction pattern indicative of fraud or money laundering",
"threshold": 0.85,
"action": {"type": "webhook", "url": "https://alerts.example.com/fraud"},
"cooldown_seconds": 300,
"status": "active",
"last_fired_at": "2025-12-14T08:30:00Z",
"fire_count": 7
}
]
delete_trigger
Delete a registered trigger.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
trigger_id | string | Yes | ID of the trigger to delete |
Returns: Deletion confirmation.
Example:
// Input
{"trigger_id": "trig_a1b2c3d4"}
// Output
{"status": "deleted", "trigger_id": "trig_a1b2c3d4"}
test_trigger
Test a trigger against sample content without actually firing the action.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
trigger_id | string | Yes | ID of the trigger to test |
content | string | Yes | Sample content to evaluate |
Returns: Evaluation result with similarity score.
Example:
// Input
{
"trigger_id": "trig_a1b2c3d4",
"content": "Multiple high-value wire transfers to newly created offshore accounts within 24 hours"
}
// Output
{
"trigger_id": "trig_a1b2c3d4",
"would_fire": true,
"similarity_score": 0.92,
"threshold": 0.85,
"action_type": "webhook"
}
Branch Tools (5)
Tools for creating isolated data branches for speculative agent operations. See State Branching for architecture details.
branch_create
Create a new isolated branch from the main data. Branches use DuckDB schema-based isolation with zero-copy views on creation.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
name | string | No | Human-readable branch name |
description | string | No | Purpose of the branch |
ttl_seconds | integer | No | Auto-expire after N seconds (default: 3600) |
Returns: Branch creation confirmation.
Example:
// Input
{
"name": "what-if-pricing-change",
"description": "Simulate the effect of a 10% price increase on enterprise tier",
"ttl_seconds": 7200
}
// Output
{
"branch_id": "br_a1b2c3d4",
"name": "what-if-pricing-change",
"schema": "branch_a1b2c3d4",
"status": "active",
"expires_at": "2025-12-15T12:00:00Z"
}
branch_query
Execute a SQL query within a branch's isolated schema.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
branch_id | string | Yes | Branch to query |
sql | string | Yes | SQL query to execute |
Returns: Array of row objects (same format as query).
Example:
// Input
{
"branch_id": "br_a1b2c3d4",
"sql": "UPDATE products SET price = price * 1.10 WHERE tier = 'enterprise'; SELECT tier, AVG(price) as avg_price FROM products GROUP BY tier;"
}
// Output
[
{"tier": "enterprise", "avg_price": 1100.00},
{"tier": "growth", "avg_price": 500.00},
{"tier": "free", "avg_price": 0.00}
]
Writes within a branch are copy-on-write -- the first write to a table materializes a copy from main. Subsequent queries see the modified data without affecting the main schema.
branch_merge
Merge a branch's changes back into the main schema.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
branch_id | string | Yes | Branch to merge |
strategy | string | No | Conflict resolution: "branch_wins", "main_wins", "abort" (default: "branch_wins") |
Returns: Merge result with conflict details.
Example:
// Input
{
"branch_id": "br_a1b2c3d4",
"strategy": "branch_wins"
}
// Output
{
"status": "merged",
"branch_id": "br_a1b2c3d4",
"tables_merged": ["products"],
"conflicts": 0
}
If conflicts are detected and the strategy is "abort", no changes are applied and the conflicts are returned for review.
branch_discard
Discard a branch and clean up its isolated schema.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
branch_id | string | Yes | Branch to discard |
Returns: Discard confirmation.
Example:
// Input
{"branch_id": "br_a1b2c3d4"}
// Output
{"status": "discarded", "branch_id": "br_a1b2c3d4"}
branch_list
List all active branches for the current agent/organization.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
status | string | No | Filter: "active", "merged", "discarded", "all" (default: "active") |
Returns: Array of branch summaries.
Example:
// Input
{"status": "active"}
// Output
[
{
"branch_id": "br_a1b2c3d4",
"name": "what-if-pricing-change",
"description": "Simulate the effect of a 10% price increase on enterprise tier",
"status": "active",
"materialized_tables": ["products"],
"created_at": "2025-12-15T10:00:00Z",
"expires_at": "2025-12-15T12:00:00Z"
}
]
Error Handling
All tools return errors in a consistent format:
{
"error": {
"code": "POLICY_VIOLATION",
"message": "Agent 'analyst' is not permitted to execute DDL statements",
"request_id": "req_a1b2c3d4"
}
}
Common error codes:
| Code | Description |
|---|---|
QUERY_ERROR | SQL syntax error or execution failure |
POLICY_VIOLATION | ABAC policy blocked the operation |
QUOTA_EXCEEDED | Agent has exceeded its credit quota |
NOT_FOUND | Requested resource (memory, branch, trigger) does not exist |
BRANCH_EXPIRED | Branch TTL has expired |
CHAIN_INVALID | CoT hash chain verification failed |
Next Steps
- MCP Server -- Installation and configuration
- Agent Memory -- Memory architecture
- Chain-of-Thought Ledger -- Reasoning trace architecture
- Semantic Triggers -- Trigger evaluation pipeline
- State Branching -- Branch isolation and merging