State Branching
State branching allows agents to create isolated copies of their data environment, run speculative queries or write operations, and then either merge changes back or discard them. This is analogous to Git branches but for database state -- agents can explore "what if" scenarios without affecting production data.
How It Works
HatiData implements branching using DuckDB's schema isolation. Each branch is a separate DuckDB schema (branch_{uuid}) that starts as a set of zero-copy views pointing to the main schema's tables. On the first write to any table in the branch, that table is materialized (copy-on-write), creating a real copy that can be modified independently.
Main Schema (production data)
├── customers (real table)
├── orders (real table)
└── products (real table)
Branch branch_abc123 (created for "what-if" analysis)
├── customers → VIEW → main.customers (zero-copy, read-only)
├── orders → MATERIALIZED COPY (written to, now independent)
└── products → VIEW → main.products (zero-copy, read-only)
Zero-Copy Views
When a branch is created, HatiData creates a new DuckDB schema and populates it with views that point to the main schema's tables:
CREATE SCHEMA branch_abc123;
CREATE VIEW branch_abc123.customers AS SELECT * FROM main.customers;
CREATE VIEW branch_abc123.orders AS SELECT * FROM main.orders;
CREATE VIEW branch_abc123.products AS SELECT * FROM main.products;
This is instant regardless of table size because no data is copied. Reads from the branch return the same data as the main schema.
Copy-on-Write Materialization
When an agent writes to a table in a branch (INSERT, UPDATE, DELETE), HatiData transparently materializes that table:
- Drop the view
CREATE TABLE branch_abc123.orders AS SELECT * FROM main.orders- Apply the write operation to the materialized copy
- All subsequent reads and writes go to the materialized copy
Only tables that are actually modified are materialized. Tables that are only read remain as zero-copy views, keeping memory usage minimal.
Core Components
BranchStorageEngine
The BranchStorageEngine handles the low-level DuckDB schema operations:
| Method | Description |
|---|---|
create(branch_id) | Create a new schema with zero-copy views for all main tables |
materialize(branch_id, table) | Convert a view to a real table (copy-on-write) |
query(branch_id, sql) | Execute a read query within the branch schema |
write(branch_id, sql) | Execute a write query, materializing the target table if needed |
discard(branch_id) | Drop the branch schema and all its contents |
All operations use SET search_path = 'branch_{uuid},main' to ensure the branch schema takes precedence over main for materialized tables, while unmaterialized tables fall through to main via the views.
BranchManager
The BranchManager coordinates the branch lifecycle:
create → query/write → merge OR discard
It tracks:
- Which branches exist and their current state (active, merging, discarded)
- Which tables have been materialized in each branch
- Reference counts for concurrent access
- TTL expiration for abandoned branches
MergeEngine
When an agent is satisfied with the results of a branch exploration, they can merge changes back to the main schema. The MergeEngine handles conflict detection and resolution:
Conflict Detection
Before merging, the engine checks whether any materialized tables in the branch have also been modified in the main schema since the branch was created:
For each materialized table in branch:
main_hash = hash(main.table at current time)
base_hash = hash(main.table at branch creation time)
if main_hash != base_hash:
conflict detected on this table
Merge Strategies
Four strategies are available for handling conflicts:
| Strategy | Behavior |
|---|---|
BranchWins | Branch data overwrites main data for conflicting tables |
MainWins | Main data is preserved; branch changes to conflicting tables are discarded |
Manual | Returns the list of conflicts for the agent or human to resolve manually |
Abort | Cancels the merge entirely if any conflicts exist |
The strategy is specified per merge operation, giving agents (or their human operators) control over how conflicts are resolved.
Merge Execution
For non-conflicting tables (or when BranchWins is selected):
DROP TABLE main.orders;
ALTER TABLE branch_abc123.orders SET SCHEMA main;
-- or equivalently:
CREATE TABLE main.orders AS SELECT * FROM branch_abc123.orders;
BranchGarbageCollector
Branches that are no longer needed must be cleaned up to free DuckDB memory. The BranchGarbageCollector handles this automatically:
- Reference counting: Each branch has an
AtomicU64reference count. Active queries increment the count; completion decrements it. - TTL expiration: Branches have a configurable time-to-live (default: 1 hour). Branches older than their TTL with zero references are eligible for cleanup.
- Periodic cleanup: A background task runs at a configurable interval (default: every 5 minutes) to discard expired branches.
MCP Tools
branch_create
Create a new branch from the current main schema state.
Input:
{
"name": "q4-revenue-simulation",
"description": "Simulate the effect of a 10% price increase on Q4 revenue"
}
Output:
{
"branch_id": "branch_a1b2c3d4",
"schema_name": "branch_a1b2c3d4",
"tables_linked": 12,
"created_at": "2025-01-15T10:30:00Z"
}
The tables_linked count shows how many zero-copy views were created.
branch_query
Execute a read query within a branch's isolated environment.
Input:
{
"branch_id": "branch_a1b2c3d4",
"sql": "SELECT segment, SUM(revenue * 1.10) as projected_revenue FROM orders GROUP BY 1"
}
Output:
{
"columns": ["segment", "projected_revenue"],
"rows": [
["enterprise", 4950000.00],
["mid_market", 2310000.00],
["smb", 979000.00]
]
}
Queries within a branch see the branch's materialized tables (if any) overlaid on top of the main schema's data.
branch_merge
Merge a branch's changes back into the main schema.
Input:
{
"branch_id": "branch_a1b2c3d4",
"strategy": "BranchWins"
}
Output (no conflicts):
{
"status": "merged",
"tables_merged": 1,
"tables_skipped": 11,
"conflicts": []
}
Output (conflicts detected with Manual strategy):
{
"status": "conflicts_detected",
"conflicts": [
{
"table": "orders",
"branch_rows": 15420,
"main_rows": 15380,
"rows_diverged": 40
}
]
}
branch_discard
Discard a branch and free all resources.
Input:
{
"branch_id": "branch_a1b2c3d4"
}
Output:
{
"status": "discarded",
"tables_dropped": 1,
"views_dropped": 11
}
branch_list
List all active branches.
Input:
{
"include_expired": false
}
Output:
[
{
"branch_id": "branch_a1b2c3d4",
"name": "q4-revenue-simulation",
"materialized_tables": ["orders"],
"ref_count": 0,
"created_at": "2025-01-15T10:30:00Z",
"expires_at": "2025-01-15T11:30:00Z"
}
]
Usage Example
from hatidata_agent import HatiDataAgent
agent = HatiDataAgent(
host="your-org.proxy.hatidata.com",
agent_id="simulation-agent",
password="hd_live_your_api_key",
)
# Create a branch for "what-if" analysis
branch = agent.branch_create(
name="pricing-experiment",
description="Test effect of 15% price increase on enterprise segment",
)
branch_id = branch["branch_id"]
# Modify data in the branch (triggers copy-on-write)
agent.branch_query(branch_id, """
UPDATE orders
SET total = total * 1.15
WHERE segment = 'enterprise'
AND quarter = 'Q4'
""")
# Analyze the results within the branch
result = agent.branch_query(branch_id, """
SELECT segment,
SUM(total) as total_revenue,
AVG(total) as avg_order_value
FROM orders
WHERE quarter = 'Q4'
GROUP BY segment
ORDER BY total_revenue DESC
""")
print("Projected revenue after 15% increase:", result)
# Compare with main (unmodified) data
main_result = agent.query("""
SELECT segment,
SUM(total) as total_revenue
FROM orders
WHERE quarter = 'Q4'
GROUP BY segment
""")
print("Current revenue:", main_result)
# If the simulation looks good, merge the branch
# Or discard if it was just for analysis
agent.branch_discard(branch_id)
Configuration
| Variable | Default | Description |
|---|---|---|
HATIDATA_BRANCHING_ENABLED | true | Enable/disable state branching |
HATIDATA_BRANCH_TTL_SECS | 3600 | Default branch time-to-live (1 hour) |
HATIDATA_BRANCH_MAX_PER_ORG | 50 | Maximum concurrent branches per organization |
HATIDATA_BRANCH_GC_INTERVAL_SECS | 300 | Garbage collector run interval (5 minutes) |
HATIDATA_BRANCH_MAX_MATERIALIZED_MB | 4096 | Max materialized data per branch (4 GB) |
Use Cases
Scenario planning: Agents can create branches to model different business scenarios (pricing changes, market shifts) and compare outcomes before recommending actions.
Safe experimentation: Agents can write experimental transformations to data without risk. If the results are wrong, they simply discard the branch.
A/B testing queries: Run the same analytical query against two different data states (original vs. modified) to measure the impact of a change.
Multi-agent collaboration: One agent creates a branch, another agent reviews the changes, and a third agent approves the merge -- all with full isolation between stages.
Next Steps
- Agent Memory -- Persistent context that branches can reference
- Chain-of-Thought Ledger -- Track reasoning across branch operations
- Agent API Keys -- Control which agents can create and merge branches