Skip to main content

State Branching

State branching allows agents to create isolated copies of their data environment, run speculative queries or write operations, and then either merge changes back or discard them. This is analogous to Git branches but for database state -- agents can explore "what if" scenarios without affecting production data.

How It Works

HatiData implements branching using DuckDB's schema isolation. Each branch is a separate DuckDB schema (branch_{uuid}) that starts as a set of zero-copy views pointing to the main schema's tables. On the first write to any table in the branch, that table is materialized (copy-on-write), creating a real copy that can be modified independently.

Main Schema (production data)
├── customers (real table)
├── orders (real table)
└── products (real table)

Branch branch_abc123 (created for "what-if" analysis)
├── customers → VIEW → main.customers (zero-copy, read-only)
├── orders → MATERIALIZED COPY (written to, now independent)
└── products → VIEW → main.products (zero-copy, read-only)

Zero-Copy Views

When a branch is created, HatiData creates a new DuckDB schema and populates it with views that point to the main schema's tables:

CREATE SCHEMA branch_abc123;
CREATE VIEW branch_abc123.customers AS SELECT * FROM main.customers;
CREATE VIEW branch_abc123.orders AS SELECT * FROM main.orders;
CREATE VIEW branch_abc123.products AS SELECT * FROM main.products;

This is instant regardless of table size because no data is copied. Reads from the branch return the same data as the main schema.

Copy-on-Write Materialization

When an agent writes to a table in a branch (INSERT, UPDATE, DELETE), HatiData transparently materializes that table:

  1. Drop the view
  2. CREATE TABLE branch_abc123.orders AS SELECT * FROM main.orders
  3. Apply the write operation to the materialized copy
  4. All subsequent reads and writes go to the materialized copy

Only tables that are actually modified are materialized. Tables that are only read remain as zero-copy views, keeping memory usage minimal.

Core Components

BranchStorageEngine

The BranchStorageEngine handles the low-level DuckDB schema operations:

MethodDescription
create(branch_id)Create a new schema with zero-copy views for all main tables
materialize(branch_id, table)Convert a view to a real table (copy-on-write)
query(branch_id, sql)Execute a read query within the branch schema
write(branch_id, sql)Execute a write query, materializing the target table if needed
discard(branch_id)Drop the branch schema and all its contents

All operations use SET search_path = 'branch_{uuid},main' to ensure the branch schema takes precedence over main for materialized tables, while unmaterialized tables fall through to main via the views.

BranchManager

The BranchManager coordinates the branch lifecycle:

create → query/write → merge OR discard

It tracks:

  • Which branches exist and their current state (active, merging, discarded)
  • Which tables have been materialized in each branch
  • Reference counts for concurrent access
  • TTL expiration for abandoned branches

MergeEngine

When an agent is satisfied with the results of a branch exploration, they can merge changes back to the main schema. The MergeEngine handles conflict detection and resolution:

Conflict Detection

Before merging, the engine checks whether any materialized tables in the branch have also been modified in the main schema since the branch was created:

For each materialized table in branch:
main_hash = hash(main.table at current time)
base_hash = hash(main.table at branch creation time)
if main_hash != base_hash:
conflict detected on this table

Merge Strategies

Four strategies are available for handling conflicts:

StrategyBehavior
BranchWinsBranch data overwrites main data for conflicting tables
MainWinsMain data is preserved; branch changes to conflicting tables are discarded
ManualReturns the list of conflicts for the agent or human to resolve manually
AbortCancels the merge entirely if any conflicts exist

The strategy is specified per merge operation, giving agents (or their human operators) control over how conflicts are resolved.

Merge Execution

For non-conflicting tables (or when BranchWins is selected):

DROP TABLE main.orders;
ALTER TABLE branch_abc123.orders SET SCHEMA main;
-- or equivalently:
CREATE TABLE main.orders AS SELECT * FROM branch_abc123.orders;

BranchGarbageCollector

Branches that are no longer needed must be cleaned up to free DuckDB memory. The BranchGarbageCollector handles this automatically:

  • Reference counting: Each branch has an AtomicU64 reference count. Active queries increment the count; completion decrements it.
  • TTL expiration: Branches have a configurable time-to-live (default: 1 hour). Branches older than their TTL with zero references are eligible for cleanup.
  • Periodic cleanup: A background task runs at a configurable interval (default: every 5 minutes) to discard expired branches.

MCP Tools

branch_create

Create a new branch from the current main schema state.

Input:

{
"name": "q4-revenue-simulation",
"description": "Simulate the effect of a 10% price increase on Q4 revenue"
}

Output:

{
"branch_id": "branch_a1b2c3d4",
"schema_name": "branch_a1b2c3d4",
"tables_linked": 12,
"created_at": "2025-01-15T10:30:00Z"
}

The tables_linked count shows how many zero-copy views were created.

branch_query

Execute a read query within a branch's isolated environment.

Input:

{
"branch_id": "branch_a1b2c3d4",
"sql": "SELECT segment, SUM(revenue * 1.10) as projected_revenue FROM orders GROUP BY 1"
}

Output:

{
"columns": ["segment", "projected_revenue"],
"rows": [
["enterprise", 4950000.00],
["mid_market", 2310000.00],
["smb", 979000.00]
]
}

Queries within a branch see the branch's materialized tables (if any) overlaid on top of the main schema's data.

branch_merge

Merge a branch's changes back into the main schema.

Input:

{
"branch_id": "branch_a1b2c3d4",
"strategy": "BranchWins"
}

Output (no conflicts):

{
"status": "merged",
"tables_merged": 1,
"tables_skipped": 11,
"conflicts": []
}

Output (conflicts detected with Manual strategy):

{
"status": "conflicts_detected",
"conflicts": [
{
"table": "orders",
"branch_rows": 15420,
"main_rows": 15380,
"rows_diverged": 40
}
]
}

branch_discard

Discard a branch and free all resources.

Input:

{
"branch_id": "branch_a1b2c3d4"
}

Output:

{
"status": "discarded",
"tables_dropped": 1,
"views_dropped": 11
}

branch_list

List all active branches.

Input:

{
"include_expired": false
}

Output:

[
{
"branch_id": "branch_a1b2c3d4",
"name": "q4-revenue-simulation",
"materialized_tables": ["orders"],
"ref_count": 0,
"created_at": "2025-01-15T10:30:00Z",
"expires_at": "2025-01-15T11:30:00Z"
}
]

Usage Example

from hatidata_agent import HatiDataAgent

agent = HatiDataAgent(
host="your-org.proxy.hatidata.com",
agent_id="simulation-agent",
password="hd_live_your_api_key",
)

# Create a branch for "what-if" analysis
branch = agent.branch_create(
name="pricing-experiment",
description="Test effect of 15% price increase on enterprise segment",
)
branch_id = branch["branch_id"]

# Modify data in the branch (triggers copy-on-write)
agent.branch_query(branch_id, """
UPDATE orders
SET total = total * 1.15
WHERE segment = 'enterprise'
AND quarter = 'Q4'
""")

# Analyze the results within the branch
result = agent.branch_query(branch_id, """
SELECT segment,
SUM(total) as total_revenue,
AVG(total) as avg_order_value
FROM orders
WHERE quarter = 'Q4'
GROUP BY segment
ORDER BY total_revenue DESC
""")
print("Projected revenue after 15% increase:", result)

# Compare with main (unmodified) data
main_result = agent.query("""
SELECT segment,
SUM(total) as total_revenue
FROM orders
WHERE quarter = 'Q4'
GROUP BY segment
""")
print("Current revenue:", main_result)

# If the simulation looks good, merge the branch
# Or discard if it was just for analysis
agent.branch_discard(branch_id)

Configuration

VariableDefaultDescription
HATIDATA_BRANCHING_ENABLEDtrueEnable/disable state branching
HATIDATA_BRANCH_TTL_SECS3600Default branch time-to-live (1 hour)
HATIDATA_BRANCH_MAX_PER_ORG50Maximum concurrent branches per organization
HATIDATA_BRANCH_GC_INTERVAL_SECS300Garbage collector run interval (5 minutes)
HATIDATA_BRANCH_MAX_MATERIALIZED_MB4096Max materialized data per branch (4 GB)

Use Cases

Scenario planning: Agents can create branches to model different business scenarios (pricing changes, market shifts) and compare outcomes before recommending actions.

Safe experimentation: Agents can write experimental transformations to data without risk. If the results are wrong, they simply discard the branch.

A/B testing queries: Run the same analytical query against two different data states (original vs. modified) to measure the impact of a change.

Multi-agent collaboration: One agent creates a branch, another agent reviews the changes, and a third agent approves the merge -- all with full isolation between stages.

Next Steps

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.