State Branching

State branching allows agents to create isolated copies of their data environment, run speculative queries or write operations, and then either merge changes back or discard them. This is analogous to Git branches but for database state -- agents can explore "what if" scenarios without affecting production data.

How It Works

HatiData implements branching using DuckDB's schema isolation. Each branch is a separate DuckDB schema (branch_{uuid}) that starts as a set of zero-copy views pointing to the main schema's tables. On the first write to any table in the branch, that table is materialized (copy-on-write), creating a real copy that can be modified independently.

Main Schema (production data)
  ├── customers (real table)
  ├── orders (real table)
  └── products (real table)

Branch branch_abc123 (created for "what-if" analysis)
  ├── customers → VIEW → main.customers   (zero-copy, read-only)
  ├── orders → MATERIALIZED COPY           (written to, now independent)
  └── products → VIEW → main.products     (zero-copy, read-only)

Zero-Copy Views

When a branch is created, HatiData creates a new DuckDB schema and populates it with views that point to the main schema's tables:

CREATE SCHEMA branch_abc123;
CREATE VIEW branch_abc123.customers AS SELECT * FROM main.customers;
CREATE VIEW branch_abc123.orders AS SELECT * FROM main.orders;
CREATE VIEW branch_abc123.products AS SELECT * FROM main.products;

This is instant regardless of table size because no data is copied. Reads from the branch return the same data as the main schema.

Copy-on-Write Materialization

When an agent writes to a table in a branch (INSERT, UPDATE, DELETE), HatiData transparently materializes that table:

Drop the view
CREATE TABLE branch_abc123.orders AS SELECT * FROM main.orders
Apply the write operation to the materialized copy
All subsequent reads and writes go to the materialized copy

Only tables that are actually modified are materialized. Tables that are only read remain as zero-copy views, keeping memory usage minimal.

Core Components

BranchStorageEngine

The BranchStorageEngine handles the low-level DuckDB schema operations:

Method	Description
`create(branch_id)`	Create a new schema with zero-copy views for all main tables
`materialize(branch_id, table)`	Convert a view to a real table (copy-on-write)
`query(branch_id, sql)`	Execute a read query within the branch schema
`write(branch_id, sql)`	Execute a write query, materializing the target table if needed
`discard(branch_id)`	Drop the branch schema and all its contents

All operations use SET search_path = 'branch_{uuid},main' to ensure the branch schema takes precedence over main for materialized tables, while unmaterialized tables fall through to main via the views.

BranchManager

The BranchManager coordinates the branch lifecycle:

create → query/write → merge OR discard

It tracks:

Which branches exist and their current state (active, merging, discarded)
Which tables have been materialized in each branch
Reference counts for concurrent access
TTL expiration for abandoned branches

MergeEngine

When an agent is satisfied with the results of a branch exploration, they can merge changes back to the main schema. The MergeEngine handles conflict detection and resolution:

Conflict Detection

Before merging, the engine checks whether any materialized tables in the branch have also been modified in the main schema since the branch was created:

For each materialized table in branch:
  main_hash = hash(main.table at current time)
  base_hash = hash(main.table at branch creation time)
  if main_hash != base_hash:
    conflict detected on this table

Merge Strategies

Four strategies are available for handling conflicts:

Strategy	Behavior
`BranchWins`	Branch data overwrites main data for conflicting tables
`MainWins`	Main data is preserved; branch changes to conflicting tables are discarded
`Manual`	Returns the list of conflicts for the agent or human to resolve manually
`Abort`	Cancels the merge entirely if any conflicts exist

The strategy is specified per merge operation, giving agents (or their human operators) control over how conflicts are resolved.

Merge Execution

For non-conflicting tables (or when BranchWins is selected):

DROP TABLE main.orders;
ALTER TABLE branch_abc123.orders SET SCHEMA main;
-- or equivalently:
CREATE TABLE main.orders AS SELECT * FROM branch_abc123.orders;

BranchGarbageCollector

Branches that are no longer needed must be cleaned up to free DuckDB memory. The BranchGarbageCollector handles this automatically:

Reference counting: Each branch has an AtomicU64 reference count. Active queries increment the count; completion decrements it.
TTL expiration: Branches have a configurable time-to-live (default: 1 hour). Branches older than their TTL with zero references are eligible for cleanup.
Periodic cleanup: A background task runs at a configurable interval (default: every 5 minutes) to discard expired branches.

MCP Tools

`branch_create`

Create a new branch from the current main schema state.

Input:

{
  "name": "q4-revenue-simulation",
  "description": "Simulate the effect of a 10% price increase on Q4 revenue"
}

Output:

{
  "branch_id": "branch_a1b2c3d4",
  "schema_name": "branch_a1b2c3d4",
  "tables_linked": 12,
  "created_at": "2025-01-15T10:30:00Z"
}

The tables_linked count shows how many zero-copy views were created.

`branch_query`

Execute a read query within a branch's isolated environment.

Input:

{
  "branch_id": "branch_a1b2c3d4",
  "sql": "SELECT segment, SUM(revenue * 1.10) as projected_revenue FROM orders GROUP BY 1"
}

Output:

{
  "columns": ["segment", "projected_revenue"],
  "rows": [
    ["enterprise", 4950000.00],
    ["mid_market", 2310000.00],
    ["smb", 979000.00]
  ]
}

Queries within a branch see the branch's materialized tables (if any) overlaid on top of the main schema's data.

`branch_merge`

Merge a branch's changes back into the main schema.

Input:

{
  "branch_id": "branch_a1b2c3d4",
  "strategy": "BranchWins"
}

Output (no conflicts):

{
  "status": "merged",
  "tables_merged": 1,
  "tables_skipped": 11,
  "conflicts": []
}

Output (conflicts detected with Manual strategy):

{
  "status": "conflicts_detected",
  "conflicts": [
    {
      "table": "orders",
      "branch_rows": 15420,
      "main_rows": 15380,
      "rows_diverged": 40
    }
  ]
}

`branch_discard`

Discard a branch and free all resources.

Input:

{
  "branch_id": "branch_a1b2c3d4"
}

Output:

{
  "status": "discarded",
  "tables_dropped": 1,
  "views_dropped": 11
}

`branch_list`

List all active branches.

Input:

{
  "include_expired": false
}

Output:

[
  {
    "branch_id": "branch_a1b2c3d4",
    "name": "q4-revenue-simulation",
    "materialized_tables": ["orders"],
    "ref_count": 0,
    "created_at": "2025-01-15T10:30:00Z",
    "expires_at": "2025-01-15T11:30:00Z"
  }
]

Usage Example

from hatidata_agent import HatiDataAgent

agent = HatiDataAgent(
    host="your-org.proxy.hatidata.com",
    agent_id="simulation-agent",
    password="hd_live_your_api_key",
)

# Create a branch for "what-if" analysis
branch = agent.branch_create(
    name="pricing-experiment",
    description="Test effect of 15% price increase on enterprise segment",
)
branch_id = branch["branch_id"]

# Modify data in the branch (triggers copy-on-write)
agent.branch_query(branch_id, """
    UPDATE orders
    SET total = total * 1.15
    WHERE segment = 'enterprise'
    AND quarter = 'Q4'
""")

# Analyze the results within the branch
result = agent.branch_query(branch_id, """
    SELECT segment,
           SUM(total) as total_revenue,
           AVG(total) as avg_order_value
    FROM orders
    WHERE quarter = 'Q4'
    GROUP BY segment
    ORDER BY total_revenue DESC
""")
print("Projected revenue after 15% increase:", result)

# Compare with main (unmodified) data
main_result = agent.query("""
    SELECT segment,
           SUM(total) as total_revenue
    FROM orders
    WHERE quarter = 'Q4'
    GROUP BY segment
""")
print("Current revenue:", main_result)

# If the simulation looks good, merge the branch
# Or discard if it was just for analysis
agent.branch_discard(branch_id)

Configuration

Variable	Default	Description
`HATIDATA_BRANCHING_ENABLED`	`true`	Enable/disable state branching
`HATIDATA_BRANCH_TTL_SECS`	`3600`	Default branch time-to-live (1 hour)
`HATIDATA_BRANCH_MAX_PER_ORG`	`50`	Maximum concurrent branches per organization
`HATIDATA_BRANCH_GC_INTERVAL_SECS`	`300`	Garbage collector run interval (5 minutes)
`HATIDATA_BRANCH_MAX_MATERIALIZED_MB`	`4096`	Max materialized data per branch (4 GB)

Use Cases

Scenario planning: Agents can create branches to model different business scenarios (pricing changes, market shifts) and compare outcomes before recommending actions.

Safe experimentation: Agents can write experimental transformations to data without risk. If the results are wrong, they simply discard the branch.

A/B testing queries: Run the same analytical query against two different data states (original vs. modified) to measure the impact of a change.

Multi-agent collaboration: One agent creates a branch, another agent reviews the changes, and a third agent approves the merge -- all with full isolation between stages.

Next Steps

Agent Memory -- Persistent context that branches can reference
Chain-of-Thought Ledger -- Track reasoning across branch operations
Agent API Keys -- Control which agents can create and merge branches

How It Works​

Zero-Copy Views​

Copy-on-Write Materialization​

Core Components​

BranchStorageEngine​

BranchManager​

MergeEngine​

Conflict Detection​

Merge Strategies​

Merge Execution​

BranchGarbageCollector​

MCP Tools​

branch_create​

branch_query​

branch_merge​

branch_discard​

branch_list​

Usage Example​

Configuration​

Use Cases​

Next Steps​

Stay in the loop

How It Works

Zero-Copy Views

Copy-on-Write Materialization

Core Components

BranchStorageEngine

BranchManager

MergeEngine

Conflict Detection

Merge Strategies

Merge Execution

BranchGarbageCollector

MCP Tools

`branch_create`

`branch_query`

`branch_merge`

`branch_discard`

`branch_list`

Usage Example

Configuration

Use Cases

Next Steps