MCP Tools Reference

HatiData's MCP server exposes 25 tools in six categories: Core Query (7), Memory (5), Chain-of-Thought (3), Triggers (4), Branching (5), and Identity (1). Every tool call passes through the full security pipeline — authentication, ABAC policy evaluation, audit logging, and metering.

Two MCP Servers

HatiData ships two distinct MCP server implementations:

Proxy MCP Server (port 5440, HTTP/SSE): Exposes all 25 tools listed on this page. Used by agents connecting directly to HatiData's data plane. Supports the full security pipeline including ABAC, row-level security, column masking, metering, and audit logging.
SDK MCP Server (hatidata-mcp-server CLI): Exposes 4 basic tools (query, list_tables, describe_table, get_context) via stdio transport. Used with Claude Desktop, Cursor, and Claude Code. See MCP Setup for configuration.

The tool reference below documents the Proxy MCP Server tools.

For MCP server setup, see MCP Setup.

Core Query (7)

Tools for executing SQL and exploring schemas within the data layer.

`run_sql`

Execute DDL, DML, or DQL SQL statements. Supports both standard SQL and Snowflake-compatible syntax (NVL, IFF, DATEDIFF, VARIANT, etc.), which is auto-transpiled. Passes through the full multi-stage pipeline: policy evaluation, cost estimation, quota checks, row-level security, transpilation, execution, AI healing on failure, column masking, and audit logging.

Scope: Requires QueryWrite scope for DDL/DML statements, QueryRead for SELECT queries.

Parameters:

Name	Type	Required	Description
`sql`	string	Yes	SQL statement to execute
`timeout_ms`	integer	No	Query timeout in milliseconds (default: `30000`)

Returns: Array of row objects (for queries) or execution confirmation (for DDL/DML).

// Input
{ "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 5", "timeout_ms": 15000 }

// Output
[
  { "customer_id": 42, "revenue": 125000.00 },
  { "customer_id": 17, "revenue": 98500.00 },
  { "customer_id": 8,  "revenue": 87200.00 }
]

`run_sql_readonly`

Execute SELECT-only queries. Any DDL or DML statements are rejected. Useful for agents that should only read data without modifying it.

Scope: Requires QueryRead scope.

Parameters:

Name	Type	Required	Description
`sql`	string	Yes	SELECT query to execute
`max_rows`	integer	No	Maximum rows to return (default: `1000`)
`timeout_ms`	integer	No	Query timeout in milliseconds (default: `30000`)

Returns: Array of row objects.

// Input
{ "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY 1 ORDER BY 2 DESC", "max_rows": 5 }

// Output
[
  { "customer_id": 42, "revenue": 125000.00 },
  { "customer_id": 17, "revenue": 98500.00 },
  { "customer_id": 8,  "revenue": 87200.00 }
]

`list_schemas`

Enumerate all schemas the authenticated agent has permission to access. Results are filtered by ABAC policies.

Scope: Requires QueryRead scope.

Parameters: None.

Returns: Array of schema name strings.

// Input
{}

// Output
["main", "analytics", "staging"]

`list_tables`

List all tables within a specific schema that the authenticated agent has permission to access. Results are filtered by ABAC policies — agents only see tables they are authorized to query.

Scope: Requires QueryRead scope.

Parameters:

Name	Type	Required	Description
`schema`	string	No	Schema to list tables from (default: `"main"`)

Returns: Array of table name strings.

// Input
{ "schema": "main" }

// Output
["customers", "orders", "products", "events", "knowledge_base"]

`describe_table`

Get column names, data types, and nullability for a table. Queries information_schema.columns. Columns masked by policy are excluded from the result. Supports schema.table format for tables outside the default schema.

Scope: Requires QueryRead scope.

Parameters:

Name	Type	Required	Description
`table`	string	Yes	Name of the table to describe (supports `schema.table` format)

Returns: Array of column descriptor objects.

// Input
{ "table": "orders" }

// Output
[
  { "column_name": "id",          "data_type": "INTEGER",        "is_nullable": "NO"  },
  { "column_name": "customer_id", "data_type": "INTEGER",        "is_nullable": "NO"  },
  { "column_name": "total",       "data_type": "DECIMAL(10,2)",  "is_nullable": "YES" },
  { "column_name": "status",      "data_type": "VARCHAR",        "is_nullable": "YES" },
  { "column_name": "created_at",  "data_type": "TIMESTAMP",      "is_nullable": "YES" }
]

`run_sql_arrow`

Execute a read-only SQL query and return results in Apache Arrow IPC format (base64-encoded). Designed for high-performance data transfer to agents that can process Arrow natively.

Scope: Requires QueryRead scope.

Parameters:

Name	Type	Required	Description
`sql`	string	Yes	SELECT query to execute
`max_rows`	integer	No	Maximum rows to return (default: `10000`)

Returns: Base64-encoded Apache Arrow IPC stream.

// Input
{ "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY 1", "max_rows": 5000 }

// Output
{
  "format": "arrow-ipc",
  "data": "QVJST1cxAA...",
  "row_count": 1250,
  "schema": [
    { "name": "customer_id", "type": "Int32" },
    { "name": "revenue",     "type": "Decimal128(10,2)" }
  ]
}

`get_usage_stats`

Retrieve usage statistics for the current agent, including query counts, memory writes, branch operations, and resource consumption.

Scope: Requires QueryRead scope.

Parameters: None.

Returns: Usage statistics object.

// Input
{}

// Output
{
  "queries_executed": 142,
  "memory_writes": 37,
  "memory_reads": 85,
  "branches_created": 3,
  "resources_used_mb": 256.5,
  "period_start": "2025-12-01T00:00:00Z",
  "period_end": "2025-12-15T23:59:59Z"
}

Memory (5)

Tools for storing, retrieving, and managing long-term agent memory. Memory uses a hybrid architecture: vector ANN search for semantic recall, with structured metadata filtering and storage. See Persistent Memory for architecture details.

`store_memory`

Store a new memory entry with optional tags, metadata, and importance scoring. Embedding is generated asynchronously after storage.

Parameters:

Name	Type	Required	Description
`content`	string	Yes	Memory content
`tags`	string[]	No	Tags for categorical filtering
`metadata`	object	No	Arbitrary key-value metadata
`importance`	float	No	Importance score 0.0–1.0 (default: `0.5`)

Returns: Memory creation confirmation with ID.

// Input
{
  "content": "Enterprise customers have a 95% renewal rate in Q4",
  "tags": ["revenue", "enterprise", "retention"],
  "metadata": { "source": "quarterly-review", "quarter": "Q4-2025" },
  "importance": 0.9
}

// Output
{
  "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "stored",
  "has_embedding": false
}

has_embedding becomes true once the async embedding worker processes the entry (typically within seconds).

`search_memory`

Search memories using natural language queries with optional tag filters. Uses vector ANN for semantic search, then joins with structured metadata for filtering. Falls back to heuristic scoring if the vector index is unavailable.

Parameters:

Name	Type	Required	Description
`query`	string	Yes	Natural language search query
`limit`	integer	No	Maximum results (default: `10`)
`min_score`	float	No	Minimum similarity score threshold (default: `0.7`)
`tags`	string[]	No	Filter results to memories with any of these tags

Returns: Array of memory objects ranked by similarity score.

// Input
{ "query": "customer renewal rates", "limit": 3, "tags": ["revenue", "retention"] }

// Output
[
  {
    "memory_id": "a1b2c3d4-...",
    "content": "Enterprise customers have a 95% renewal rate in Q4",
    "tags": ["revenue", "enterprise", "retention"],
    "importance": 0.9,
    "similarity_score": 0.94,
    "metadata": { "source": "quarterly-review" },
    "created_at": "2025-12-15T10:30:00Z"
  }
]

`get_agent_state`

Retrieve a named state value for the current agent. State is stored in _hatidata_agent_state, scoped by agent_id.

Parameters:

Name	Type	Required	Description
`key`	string	Yes	State key name

Returns: State key-value pair with timestamp.

// Input
{ "key": "current_task" }

// Output
{
  "key": "current_task",
  "value": "Analyzing Q4 revenue data for the board presentation",
  "updated_at": "2025-12-15T14:20:00Z"
}

`set_agent_state`

Set or update a named state value for the current agent. Values can be any JSON type (string, number, boolean, object, array).

Parameters:

Name	Type	Required	Description
`key`	string	Yes	State key name
`value`	any	Yes	State value (any valid JSON)

Returns: Confirmation.

// Input
{ "key": "current_task", "value": { "name": "Generating the final report", "progress": 0.75, "sections_done": ["intro", "analysis"] } }

// Output
{ "status": "ok", "key": "current_task" }

`delete_memory`

Delete a specific memory entry by ID. This operation removes both the metadata record and the vector embedding. Irreversible.

Parameters:

Name	Type	Required	Description
`memory_id`	string	Yes	UUID of the memory to delete

Returns: Deletion confirmation.

// Input
{ "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" }

// Output
{ "status": "deleted", "memory_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" }

Chain-of-Thought (3)

Tools for recording and replaying agent reasoning traces. Steps are cryptographic hash-chained per session for tamper detection. See Chain-of-Thought Ledger for architecture details.

`log_reasoning_step`

Log a single step in an agent's reasoning chain. The proxy auto-manages session IDs — each agent connection maintains its own session, so there is no need to pass a session_id parameter.

Parameters:

Name	Type	Required	Description
`step_type`	string	Yes	One of: `observation`, `hypothesis`, `decision`, `action`, `reflection`
`content`	string	Yes	Content of the reasoning step
`confidence`	float	No	Confidence score for this step (0.0–1.0)
`evidence`	string[]	No	Supporting evidence or references for this step

Returns: Step confirmation with hash chain data.

// Input
{
  "step_type": "hypothesis",
  "content": "Revenue is up 15% but concentrated in enterprise. Need to check SMB trends.",
  "confidence": 0.85,
  "evidence": ["Q4 revenue report showed 15% YoY growth", "Enterprise segment accounts for 78% of total"]
}

// Output
{
  "trace_id": "f1e2d3c4-...",
  "step_number": 3,
  "hash": "a7b8c9d0e1f2...",
  "prev_hash": "1234abcd5678...",
  "session_id": "auto-session-a1b2c3d4"
}

Steps with step_type of decision are always embedded for semantic search, regardless of the configured sampling rate.

`replay_decision`

Replay the full reasoning chain for a session, including cryptographic hash chain verification.

Parameters:

Name	Type	Required	Description
`session_id`	string	Yes	Session identifier to replay
`verify_chain`	boolean	No	Verify hash chain integrity (default: `true`)

Returns: Array of reasoning steps with chain verification status.

// Input
{ "session_id": "analysis-2025-12-15", "verify_chain": true }

// Output
{
  "session_id": "analysis-2025-12-15",
  "chain_valid": true,
  "step_count": 8,
  "steps": [
    {
      "step_number": 0,
      "step_type": "observation",
      "content": "User asked about Q4 revenue trends",
      "hash": "1234abcd...",
      "prev_hash": null,
      "timestamp": "2025-12-15T10:00:00Z"
    },
    {
      "step_number": 1,
      "step_type": "planning",
      "content": "Plan: 1) Query revenue by segment 2) Compare Q3 vs Q4 3) Identify drivers",
      "hash": "5678efgh...",
      "prev_hash": "1234abcd...",
      "timestamp": "2025-12-15T10:00:01Z"
    }
  ]
}

If chain_valid is false, the response indicates which step has an invalid hash.

`get_session_history`

List reasoning sessions for the current agent with optional filtering.

Parameters:

Name	Type	Required	Description
`limit`	integer	No	Maximum sessions to return (default: `20`)
`agent_id`	string	No	Filter by agent (default: current agent)
`since`	string	No	ISO 8601 timestamp — only sessions after this time

Returns: Array of session summaries.

// Input
{ "limit": 5 }

// Output
[
  {
    "session_id": "analysis-2025-12-15",
    "step_count": 8,
    "first_step_at": "2025-12-15T10:00:00Z",
    "last_step_at": "2025-12-15T10:05:30Z",
    "step_types": ["observation", "planning", "tool_call", "tool_result", "thought", "decision", "summary"]
  },
  {
    "session_id": "report-2025-12-14",
    "step_count": 12,
    "first_step_at": "2025-12-14T15:30:00Z",
    "last_step_at": "2025-12-14T15:42:00Z",
    "step_types": ["observation", "thought", "action", "result", "reflection", "summary"]
  }
]

Triggers (4)

Tools for managing semantic triggers — event-driven automations that fire when content matches a concept via cosine similarity. Evaluation uses a two-stage pipeline: vector ANN pre-filter (top-K) followed by exact cosine verification. See Semantic Triggers for architecture details.

`register_trigger`

Parameters:

Name	Type	Required	Description
`name`	string	Yes	Human-readable trigger name
`concept`	string	Yes	Natural language concept to match against
`threshold`	float	No	Cosine similarity threshold (default: `0.8`)
`action`	object	Yes	Action to take when triggered
`cooldown_seconds`	integer	No	Minimum seconds between firings (default: `60`)

Action types:

Type	Required Fields	Description
`webhook`	`url`, `headers`	HMAC-SHA256 signed POST to URL
`agent_notify`	`agent_id`, `message`	Add to agent's offline notification inbox
`write_event`	`event_type`, `payload`	Write a structured event to the audit log
`flag_for_review`	`reason`, `priority`	Flag content for human review in the dashboard

Returns: Trigger registration confirmation.

// Input
{
  "name": "fraud-pattern-detector",
  "concept": "unusual transaction pattern indicative of fraud or money laundering",
  "threshold": 0.85,
  "action": {
    "type": "webhook",
    "url": "https://alerts.example.com/fraud",
    "headers": { "X-Source": "hatidata-trigger" }
  },
  "cooldown_seconds": 300
}

// Output
{
  "trigger_id": "trig_a1b2c3d4",
  "name": "fraud-pattern-detector",
  "status": "active"
}

`list_triggers`

List all registered triggers for the current organization.

Parameters:

Name	Type	Required	Description
`status`	string	No	`active`, `paused`, or `all` (default: `active`)

Returns: Array of trigger objects.

// Input
{ "status": "active" }

// Output
[
  {
    "trigger_id": "trig_a1b2c3d4",
    "name": "fraud-pattern-detector",
    "concept": "unusual transaction pattern indicative of fraud or money laundering",
    "threshold": 0.85,
    "action": { "type": "webhook", "url": "https://alerts.example.com/fraud" },
    "cooldown_seconds": 300,
    "status": "active",
    "last_fired_at": "2025-12-14T08:30:00Z",
    "fire_count": 7
  }
]

`delete_trigger`

Delete a registered trigger.

Parameters:

Name	Type	Required	Description
`trigger_id`	string	Yes	ID of the trigger to delete

Returns: Deletion confirmation.

// Input
{ "trigger_id": "trig_a1b2c3d4" }

// Output
{ "status": "deleted", "trigger_id": "trig_a1b2c3d4" }

`test_trigger`

Test a trigger against sample content without firing the action. Useful for threshold calibration.

Parameters:

Name	Type	Required	Description
`trigger_id`	string	Yes	ID of the trigger to test
`content`	string	Yes	Sample content to evaluate

Returns: Evaluation result with similarity score.

// Input
{
  "trigger_id": "trig_a1b2c3d4",
  "content": "Multiple high-value wire transfers to newly created offshore accounts within 24 hours"
}

// Output
{
  "trigger_id": "trig_a1b2c3d4",
  "would_fire": true,
  "similarity_score": 0.92,
  "threshold": 0.85,
  "action_type": "webhook"
}

Branching (5)

Tools for creating isolated data branches for speculative agent operations. Branches use schema-based isolation with zero-copy views on creation and copy-on-write materialization on first write. See Branch Isolation for architecture details.

`branch_create`

Create a new isolated branch from the main data layer.

Parameters:

Name	Type	Required	Description
`name`	string	Yes	Human-readable branch name
`description`	string	No	Purpose of the branch
`ttl_hours`	integer	No	Auto-expire after N hours (default: `24`)

Returns: Branch creation confirmation.

// Input
{
  "name": "what-if-pricing-change",
  "description": "Simulate the effect of a 10% price increase on enterprise tier",
  "ttl_hours": 48
}

// Output
{
  "branch_id": "br_a1b2c3d4",
  "name": "what-if-pricing-change",
  "schema": "branch_a1b2c3d4",
  "status": "active",
  "expires_at": "2025-12-17T10:00:00Z"
}

`branch_query`

Execute a SQL query within a branch's isolated schema. The first write to a table in a branch materializes a copy from main; subsequent operations see modified data without affecting the main schema.

Parameters:

Name	Type	Required	Description
`branch_id`	string	Yes	Branch to query
`sql`	string	Yes	SQL query to execute

Returns: Array of row objects (same format as query).

// Input
{
  "branch_id": "br_a1b2c3d4",
  "sql": "UPDATE products SET price = price * 1.10 WHERE tier = 'enterprise'; SELECT tier, AVG(price) AS avg_price FROM products GROUP BY tier;"
}

// Output
[
  { "tier": "enterprise", "avg_price": 1100.00 },
  { "tier": "growth",     "avg_price": 500.00  },
  { "tier": "free",       "avg_price": 0.00    }
]

`branch_merge`

Merge a branch's changes back into the main schema.

Parameters:

Name	Type	Required	Description
`branch_id`	string	Yes	Branch to merge
`strategy`	string	No	Conflict resolution: `ours` (branch changes win), `theirs` (main data wins), or `fail_on_conflict` (abort if conflicts detected, default)

Returns: Merge result with conflict details.

// Input
{ "branch_id": "br_a1b2c3d4", "strategy": "ours" }

// Output
{
  "status": "merged",
  "branch_id": "br_a1b2c3d4",
  "tables_merged": ["products"],
  "conflicts": 0
}

If conflicts are detected and the strategy is fail_on_conflict, no changes are applied and the conflicts are returned for review.

SDK aliases

The SDK uses different names for the same strategies: branch_wins (= ours), main_wins (= theirs), abort (= fail_on_conflict). Both naming conventions are accepted by the proxy.

`branch_discard`

Discard a branch and clean up its isolated schema.

Parameters:

Name	Type	Required	Description
`branch_id`	string	Yes	Branch to discard

Returns: Discard confirmation.

// Input
{ "branch_id": "br_a1b2c3d4" }

// Output
{ "status": "discarded", "branch_id": "br_a1b2c3d4" }

`branch_list`

List all branches for the current agent or organization.

Parameters:

Name	Type	Required	Description
`status`	string	No	`active`, `merged`, `discarded`, or `all` (default: `active`)

Returns: Array of branch summaries.

// Input
{ "status": "active" }

// Output
[
  {
    "branch_id": "br_a1b2c3d4",
    "name": "what-if-pricing-change",
    "description": "Simulate the effect of a 10% price increase on enterprise tier",
    "status": "active",
    "materialized_tables": ["products"],
    "created_at": "2025-12-15T10:00:00Z",
    "expires_at": "2025-12-15T12:00:00Z"
  }
]

Identity (1)

Tools for agent self-registration. See Agent Registry for the full registration model.

`register_agent`

Register a new agent identity. Returns a stable fingerprint (hd_agt_xxx) and initial trust level. Most agents register automatically on first connection — this tool is for MCP-native agents (Claude, Cursor, AutoGen) that want to explicitly identify themselves.

Parameters:

Name	Type	Required	Description
`name`	string	Yes	Display name for the agent
`framework`	string	No	Agent framework (default: `"custom"`)
`scopes`	string[]	No	Requested scopes (default: `["agent"]`)
`session_bound`	boolean	No	Ephemeral session-only agent (default: `false`)

Returns: Registration confirmation with fingerprint and trust level.

// Input
{
  "name": "ResearchAssistant",
  "framework": "mcp",
  "scopes": ["query:read", "memory:read", "memory:write"],
  "session_bound": true
}

// Output
{
  "fingerprint": "hd_agt_c8e5f2a3",
  "trust_level": "provisional",
  "expires_at": "session_end"
}

When session_bound is true, the agent identity is ephemeral and expires when the MCP session ends. The fingerprint is not reusable across sessions.

`spawn_agent` (coming soon)

Register a sub-agent on behalf of the current agent. Requires Orchestrator trust level. The parent-child relationship is stored in the registry for lineage tracking.

Parameters:

Name	Type	Required	Description
`name`	string	Yes	Display name for the sub-agent
`framework`	string	No	Agent framework (default: `"custom"`)
`scopes`	string[]	No	Scopes for the sub-agent (must be subset of parent's scopes)
`allowed_tables`	string[]	No	Table allowlist for the sub-agent
`max_queries_hr`	integer	No	Query rate limit per hour

Returns: Sub-agent fingerprint with parent lineage.

// Input
{
  "name": "DataValidator",
  "framework": "custom",
  "scopes": ["query:read", "memory:write"],
  "allowed_tables": ["agent_memories"],
  "max_queries_hr": 100
}

// Output
{
  "fingerprint": "hd_agt_b7d4e2f1",
  "trust_level": "provisional",
  "parent_fingerprint": "hd_agt_orch_01"
}

Error Handling

All tools return errors in a consistent format:

{
  "error": {
    "code": "POLICY_VIOLATION",
    "message": "Agent 'analyst' is not permitted to execute DDL statements",
    "request_id": "req_a1b2c3d4"
  }
}

Code	Description
`QUERY_ERROR`	SQL syntax error or execution failure
`POLICY_VIOLATION`	ABAC policy blocked the operation
`QUOTA_EXCEEDED`	Agent has exceeded its credit quota
`NOT_FOUND`	Requested resource (memory, branch, trigger) does not exist
`BRANCH_EXPIRED`	Branch TTL has expired and the schema has been cleaned up
`CHAIN_INVALID`	CoT hash chain verification failed — chain may have been tampered with

Core Query (7)​

run_sql​

run_sql_readonly​

list_schemas​

list_tables​

describe_table​

run_sql_arrow​

get_usage_stats​

Memory (5)​

store_memory​

search_memory​

get_agent_state​

set_agent_state​

delete_memory​

Chain-of-Thought (3)​

log_reasoning_step​

replay_decision​

get_session_history​

Triggers (4)​

register_trigger​

list_triggers​

delete_trigger​

test_trigger​

Branching (5)​

branch_create​

branch_query​

branch_merge​

branch_discard​

branch_list​

Identity (1)​

register_agent​

spawn_agent (coming soon)​

Error Handling​

Stay in the loop

Core Query (7)

`run_sql`

`run_sql_readonly`

`list_schemas`

`list_tables`

`describe_table`

`run_sql_arrow`

`get_usage_stats`

Memory (5)

`store_memory`

`search_memory`

`get_agent_state`

`set_agent_state`

`delete_memory`

Chain-of-Thought (3)

`log_reasoning_step`

`replay_decision`

`get_session_history`

Triggers (4)

`register_trigger`

`list_triggers`

`delete_trigger`

`test_trigger`

Branching (5)

`branch_create`

`branch_query`

`branch_merge`

`branch_discard`

`branch_list`

Identity (1)

`register_agent`

`spawn_agent` (coming soon)

Error Handling