Branch-Based Exploration
In this tutorial you will use HatiData's branch isolation to let agents explore data safely without affecting production state. Branches create schema-level isolation with copy-on-write semantics -- zero-copy on creation, data is only duplicated when a branch writes to a table.
By the end you will have:
- Created branches for safe agent exploration
- Run queries and writes within a branch
- Merged branch results back to main
- Handled merge conflicts with different strategies
Prerequisites
- Python 3.10+
- HatiData proxy running locally or in the cloud
hatidataSDK installed
pip install hatidata
export HATIDATA_API_KEY="hd_live_your_api_key"
export HATIDATA_HOST="localhost"
Step 1: Understand Branch Architecture
HatiData branches use schema-based isolation:
main (default schema)
├── customers
├── orders
└── products
branch_abc123 (exploration branch)
├── customers → VIEW pointing to main.customers (zero-copy)
├── orders → VIEW pointing to main.orders (zero-copy)
└── products → MATERIALIZED (copy-on-write, modified by agent)
When a branch is created, every table in main gets a zero-copy view in the branch schema. The first time the agent writes to a table within the branch, that table is materialized as a full copy. Reads from unmodified tables still reference the original data.
Step 2: Create a Branch
import os
from hatidata import HatiDataClient
client = HatiDataClient(
host=os.environ["HATIDATA_HOST"],
port=5439,
api_key=os.environ["HATIDATA_API_KEY"],
)
# Create a branch for exploration
branch = client.branches.create(
name="pricing-experiment",
description="Testing new pricing tiers before applying to production",
agent_id="pricing-agent",
ttl_hours=24, # Auto-cleanup after 24 hours if not merged
)
print(f"Branch created: {branch.branch_id}")
print(f"Schema: {branch.schema_name}")
Step 3: Query Within a Branch
Queries within a branch see the branch's data (modified tables) plus main data (unmodified tables):
# Read from the branch -- this reads from main (zero-copy view)
products = client.branches.query(
branch_id=branch.branch_id,
sql="SELECT product_id, name, price, tier FROM products ORDER BY price",
)
for row in products:
print(f" {row['name']}: ${row['price']} ({row['tier']})")
Step 4: Write Within a Branch
Writes are isolated to the branch. The first write to a table triggers copy-on-write materialization.
# Modify pricing in the branch -- this triggers copy-on-write for the products table
client.branches.write(
branch_id=branch.branch_id,
sql="""
UPDATE products
SET price = price * 1.15,
tier = 'premium'
WHERE category = 'enterprise'
""",
)
# Verify the change in the branch
branch_products = client.branches.query(
branch_id=branch.branch_id,
sql="SELECT name, price, tier FROM products WHERE category = 'enterprise'",
)
# Verify main is unchanged
main_products = client.query(
"SELECT name, price, tier FROM products WHERE category = 'enterprise'",
)
print("Branch prices (after 15% increase):")
for row in branch_products:
print(f" {row['name']}: ${row['price']:.2f}")
print("\nMain prices (unchanged):")
for row in main_products:
print(f" {row['name']}: ${row['price']:.2f}")
Step 5: Run Analysis on the Branch
Perform analytical queries on the branch to evaluate the impact of changes:
# Simulate revenue impact with new pricing
impact = client.branches.query(
branch_id=branch.branch_id,
sql="""
SELECT
p.category,
COUNT(o.order_id) AS order_count,
SUM(p.price * o.quantity) AS projected_revenue,
SUM(main_p.price * o.quantity) AS current_revenue
FROM products p
JOIN orders o ON p.product_id = o.product_id
JOIN main.products main_p ON p.product_id = main_p.product_id
WHERE o.order_date >= '2025-10-01'
GROUP BY p.category
ORDER BY projected_revenue DESC
""",
)
print("Revenue Impact Analysis:")
for row in impact:
delta = row["projected_revenue"] - row["current_revenue"]
print(f" {row['category']}: ${row['projected_revenue']:,.0f} "
f"(+${delta:,.0f} vs current)")
Step 6: Merge or Discard
Merge the Branch
If the experiment is successful, merge the branch changes back to main:
merge_result = client.branches.merge(
branch_id=branch.branch_id,
strategy="branch_wins", # Branch changes overwrite main
)
print(f"Merge status: {merge_result.status}")
print(f"Tables merged: {merge_result.tables_merged}")
print(f"Rows affected: {merge_result.rows_affected}")
Discard the Branch
If the experiment is not worth keeping, discard it:
client.branches.discard(branch_id=branch.branch_id)
print("Branch discarded. Main data unchanged.")
Step 7: Handle Merge Conflicts
Conflicts occur when both main and the branch have modified the same rows. HatiData provides four merge strategies:
| Strategy | Behavior |
|---|---|
branch_wins | Branch changes overwrite main for conflicting rows |
main_wins | Main values are kept, branch changes for conflicting rows are discarded |
manual | Returns a conflict report for manual resolution |
abort | Rolls back the merge if any conflicts are detected |
Detecting Conflicts
# Attempt merge with conflict detection
merge_result = client.branches.merge(
branch_id=branch.branch_id,
strategy="manual",
)
if merge_result.has_conflicts:
print(f"Conflicts detected in {len(merge_result.conflicts)} tables:")
for conflict in merge_result.conflicts:
print(f"\n Table: {conflict.table}")
print(f" Conflicting rows: {conflict.row_count}")
for row in conflict.rows[:5]:
print(f" PK={row['pk']}: main={row['main_value']} vs branch={row['branch_value']}")
Resolving Conflicts
# After reviewing, resolve with a specific strategy
if merge_result.has_conflicts:
resolution = client.branches.resolve(
branch_id=branch.branch_id,
merge_id=merge_result.merge_id,
resolutions={
"products": "branch_wins", # Keep branch pricing changes
"customers": "main_wins", # Keep main customer data
},
)
print(f"Conflicts resolved. Final status: {resolution.status}")
Step 8: List and Monitor Branches
# List all active branches
branches = client.branches.list()
for b in branches:
print(f" {b.name} ({b.branch_id}): created {b.created_at}, "
f"tables modified: {b.modified_table_count}")
# Check branch details
detail = client.branches.get(branch_id=branch.branch_id)
print(f"Branch: {detail.name}")
print(f"Agent: {detail.agent_id}")
print(f"Modified tables: {detail.modified_tables}")
print(f"TTL expires: {detail.expires_at}")
SQL Monitoring
-- Active branches with their sizes
SELECT
branch_id,
name,
agent_id,
created_at,
modified_table_count,
total_size_bytes
FROM _hatidata_branches
WHERE status = 'active'
ORDER BY created_at DESC;
What You Built
| Capability | HatiData Feature |
|---|---|
| Safe exploration | branches.create() with schema isolation |
| Zero-copy branch creation | Schema views (no data duplication on create) |
| Copy-on-write writes | Automatic materialization on first write |
| Revenue impact analysis | Cross-schema queries (main.table references) |
| Merge with conflict handling | 4 merge strategies |
| Auto-cleanup | TTL-based garbage collection |
Related Concepts
- Branch Isolation -- Full architecture reference
- Branch Recipes -- Advanced branching patterns
- Research Agent Tutorial -- Branching for research agents
- MCP Tools Reference --
branch_create,branch_mergetools - Concurrency Model -- How branches handle concurrent access