CoT Replay & Compliance
In this tutorial you will use HatiData's Chain-of-Thought (CoT) ledger to build a compliance-auditable AI system. Every reasoning step is recorded in an immutable, cryptographically hash-chained ledger that can be replayed and verified at any time.
By the end you will have:
- Logged reasoning steps across multiple step types
- Replayed a complete decision session
- Verified hash chain integrity programmatically
- Exported audit reports for compliance review
Prerequisites
- Python 3.10+
- HatiData proxy running locally or in the cloud
hatidataSDK installed
pip install hatidata
export HATIDATA_API_KEY="hd_live_your_api_key"
export HATIDATA_HOST="localhost"
Step 1: Understand the CoT Ledger
The CoT ledger stores each reasoning step as an immutable record with the following properties:
| Field | Description |
|---|---|
trace_id | Unique ID for the step |
session_id | Groups steps into a decision session |
agent_id | The agent that produced the step |
step_type | One of 12 types (observation, reasoning, tool_call, conclusion, etc.) |
content | The reasoning text |
prev_hash | Cryptographic hash of the previous step in this session |
hash | Cryptographic hash of this step (includes prev_hash, creating a chain) |
metadata | Arbitrary JSON metadata |
created_at | Timestamp |
Each step's hash is computed as:
hash = cryptographic_hash(session_id + step_number + step_type + content + prev_hash)
This creates a tamper-evident chain: modifying any step invalidates all subsequent hashes.
Step 2: Log Reasoning Steps
import os
from hatidata import HatiDataClient
from hatidata.cot import CotClient, StepType
client = HatiDataClient(
host=os.environ["HATIDATA_HOST"],
port=5439,
api_key=os.environ["HATIDATA_API_KEY"],
)
cot = CotClient(client)
AGENT_ID = "compliance-agent"
SESSION_ID = "loan-review-app-2847"
# Step 1: Observation - what the agent sees
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.OBSERVATION,
content="Received loan application #2847. Applicant: John D., amount: $50,000, purpose: business expansion.",
metadata={"application_id": "app-2847", "amount": 50000},
)
# Step 2: Reasoning - agent's analysis
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.REASONING,
content=(
"Evaluating credit score (720), debt-to-income ratio (0.32), and business revenue ($180K/yr). "
"Credit score exceeds minimum threshold of 680. DTI ratio is within acceptable range (<0.40). "
"Business revenue supports the requested loan amount."
),
metadata={"credit_score": 720, "dti_ratio": 0.32},
)
# Step 3: Tool call - agent queries external data
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.TOOL_CALL,
content="Queried risk assessment model. Risk score: 0.12 (low). No fraud indicators detected.",
metadata={"tool": "risk_model_v3", "risk_score": 0.12},
)
# Step 4: Reasoning - weighing factors
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.REASONING,
content=(
"All three criteria (credit score, DTI, risk model) are within approved thresholds. "
"Recommending approval with standard terms. No escalation required."
),
)
# Step 5: Conclusion - final decision
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.CONCLUSION,
content="APPROVED. Loan application #2847 approved for $50,000 at standard rate. No conditions.",
metadata={"decision": "approved", "conditions": None},
)
print(f"Logged 5 reasoning steps for session: {SESSION_ID}")
Step 3: Replay a Decision Session
Replay the entire session to see every step the agent took in order:
def replay_session(session_id: str):
"""Replay and display a complete decision session."""
trace = cot.replay_session(
agent_id=AGENT_ID,
session_id=session_id,
)
print(f"Session: {session_id}")
print(f"Steps: {len(trace.steps)}")
print(f"Chain: {'VALID' if trace.chain_valid else 'BROKEN'}")
print("-" * 60)
for i, step in enumerate(trace.steps, 1):
print(f"\nStep {i}: [{step.step_type}] at {step.timestamp}")
print(f" Content: {step.content}")
print(f" Hash: {step.hash[:32]}...")
if step.metadata:
print(f" Meta: {step.metadata}")
replay_session(SESSION_ID)
Expected output:
Session: loan-review-app-2847
Steps: 5
Chain: VALID
------------------------------------------------------------
Step 1: [observation] at 2025-12-15T10:30:00Z
Content: Received loan application #2847. Applicant: John D., amount: $50,000...
Hash: a1b2c3d4e5f6789012345678...
Step 2: [reasoning] at 2025-12-15T10:30:01Z
Content: Evaluating credit score (720), debt-to-income ratio (0.32)...
Hash: f8e7d6c5b4a39281726354...
Meta: {'credit_score': 720, 'dti_ratio': 0.32}
...
Step 4: Verify Hash Chain Integrity
Programmatically verify that no steps have been tampered with:
def verify_chain(session_id: str) -> bool:
"""Verify the cryptographic hash chain for a session."""
trace = cot.replay_session(
agent_id=AGENT_ID,
session_id=session_id,
)
if not trace.chain_valid:
print(f"CHAIN BROKEN at step {trace.break_index + 1}")
broken_step = trace.steps[trace.break_index]
print(f" Expected hash: {trace.expected_hash}")
print(f" Actual hash: {broken_step.hash}")
return False
print(f"Chain verified: {len(trace.steps)} steps, all hashes valid.")
return True
is_valid = verify_chain(SESSION_ID)
The CoT ledger table (_hatidata_cot) has an append-only enforcer that blocks UPDATE, DELETE, TRUNCATE, and DROP operations. Even if someone bypasses the enforcer and modifies a row, the hash chain verification will detect the tampering.
Step 5: Export Audit Reports
Generate compliance reports by querying the CoT table directly:
def export_audit_report(session_id: str) -> dict:
"""Export a structured audit report for a decision session."""
trace = cot.replay_session(
agent_id=AGENT_ID,
session_id=session_id,
)
report = {
"session_id": session_id,
"agent_id": AGENT_ID,
"total_steps": len(trace.steps),
"chain_integrity": "VALID" if trace.chain_valid else "BROKEN",
"first_step_at": trace.steps[0].timestamp if trace.steps else None,
"last_step_at": trace.steps[-1].timestamp if trace.steps else None,
"decision": None,
"steps": [],
}
for step in trace.steps:
report["steps"].append({
"step_number": step.step_number,
"type": step.step_type,
"content": step.content,
"hash": step.hash,
"timestamp": step.timestamp,
"metadata": step.metadata,
})
if step.step_type == "conclusion":
report["decision"] = step.content
return report
report = export_audit_report(SESSION_ID)
print(f"Decision: {report['decision']}")
print(f"Integrity: {report['chain_integrity']}")
SQL-Based Audit Queries
-- All decisions made by an agent in the last 30 days
SELECT session_id, content AS decision, created_at
FROM _hatidata_cot
WHERE agent_id = 'compliance-agent'
AND step_type = 'conclusion'
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY created_at DESC;
-- Sessions with more than 10 reasoning steps (complex decisions)
SELECT session_id, COUNT(*) AS step_count
FROM _hatidata_cot
WHERE agent_id = 'compliance-agent'
GROUP BY session_id
HAVING COUNT(*) > 10
ORDER BY step_count DESC;
-- All tool calls made during a specific session
SELECT step_number, content, metadata, created_at
FROM _hatidata_cot
WHERE session_id = 'loan-review-app-2847'
AND step_type = 'tool_call'
ORDER BY step_number;
Step 6: Compliance Best Practices
Always Log Critical Step Types
For compliance-sensitive agents, ensure these step types are always logged:
| Step Type | When to Log |
|---|---|
OBSERVATION | Every external input the agent receives |
REASONING | Every evaluation or analysis the agent performs |
TOOL_CALL | Every external system the agent queries |
CONCLUSION | Every decision or output the agent produces |
ESCALATION | Whenever the agent defers to a human |
Embedding Sampling
The CoT ledger supports configurable embedding sampling. For compliance, set the sampling rate to 100% on critical step types so that all conclusions and escalations are semantically searchable:
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.CONCLUSION,
content="DENIED. Application does not meet minimum credit requirements.",
metadata={"force_embed": True}, # Always embed conclusions
)
Step 7: Debugging Rogue Agents with CoT Replay
When an agent starts making unexpected decisions -- hallucinating facts, ignoring policy constraints, or looping without progress -- the CoT ledger is your primary diagnostic tool. This section walks through a systematic debugging workflow.
Scenario
Your compliance-agent has been approving loan applications that should have been flagged for manual review. You need to find out where the reasoning went wrong, whether it is an isolated incident, and how to prevent recurrence.
Step 7.1: List Recent Sessions, Sorted by Step Count
Sessions with unusually high step counts often indicate circular reasoning or repeated tool calls. Start by identifying outliers:
def find_suspicious_sessions(agent_id: str, lookback_days: int = 7, min_steps: int = 15):
"""List recent sessions sorted by step count to find outliers."""
rows = client.query(f"""
SELECT
session_id,
COUNT(*) AS step_count,
MIN(created_at) AS started_at,
MAX(created_at) AS ended_at,
EXTRACT(EPOCH FROM (MAX(created_at) - MIN(created_at))) AS duration_secs,
COUNT(*) FILTER (WHERE step_type = 'tool_call') AS tool_calls,
COUNT(*) FILTER (WHERE step_type = 'reasoning') AS reasoning_steps
FROM _hatidata_cot
WHERE agent_id = '{agent_id}'
AND created_at > NOW() - INTERVAL '{lookback_days} days'
GROUP BY session_id
HAVING COUNT(*) >= {min_steps}
ORDER BY step_count DESC
LIMIT 20
""")
print(f"{'Session ID':<30} {'Steps':>6} {'Tools':>6} {'Reason':>7} {'Duration':>10}")
print("-" * 65)
for r in rows:
dur = f"{r['duration_secs']:.0f}s" if r['duration_secs'] else "N/A"
print(f"{r['session_id']:<30} {r['step_count']:>6} {r['tool_calls']:>6} "
f"{r['reasoning_steps']:>7} {dur:>10}")
return rows
suspicious = find_suspicious_sessions("compliance-agent")
Step 7.2: Replay the Suspicious Session and Identify Divergence
Once you have identified a session with an abnormal step count or unexpected outcome, replay it and look for the exact step where reasoning diverged:
def debug_session(agent_id: str, session_id: str):
"""Replay a session and flag potential issues at each step."""
trace = cot.replay_session(agent_id=agent_id, session_id=session_id)
print(f"Session: {session_id}")
print(f"Steps: {len(trace.steps)}")
print(f"Chain: {'VALID' if trace.chain_valid else 'BROKEN'}")
print("=" * 70)
prev_step_type = None
consecutive_same = 0
for i, step in enumerate(trace.steps, 1):
flags = []
# Flag: consecutive steps of the same type (potential loop)
if step.step_type == prev_step_type:
consecutive_same += 1
if consecutive_same >= 3:
flags.append("LOOP_DETECTED")
else:
consecutive_same = 0
# Flag: tool_call without a subsequent observation
if prev_step_type == "tool_call" and step.step_type == "tool_call":
flags.append("TOOL_WITHOUT_OBSERVATION")
# Flag: conclusion without reasoning
if step.step_type == "conclusion" and prev_step_type == "observation":
flags.append("NO_REASONING_BEFORE_CONCLUSION")
# Flag: very short reasoning content
if step.step_type == "reasoning" and len(step.content) < 20:
flags.append("SHALLOW_REASONING")
flag_str = f" ** {', '.join(flags)} **" if flags else ""
print(f"\nStep {i}: [{step.step_type}]{flag_str}")
print(f" {step.content[:200]}")
prev_step_type = step.step_type
debug_session("compliance-agent", suspicious[0]["session_id"])
Step 7.3: Search CoT by Semantic Similarity
Find similar reasoning patterns across other sessions. This helps determine whether the issue is systemic or a one-off:
def find_similar_reasoning(query_text: str, agent_id: str, limit: int = 10):
"""Search CoT steps by semantic similarity to find recurring patterns."""
rows = client.query(f"""
SELECT
session_id,
step_type,
content,
semantic_match(content, '{query_text}') AS similarity,
created_at
FROM _hatidata_cot
WHERE agent_id = '{agent_id}'
AND step_type IN ('reasoning', 'conclusion')
ORDER BY semantic_rank(content, '{query_text}')
LIMIT {limit}
""")
for r in rows:
print(f"[{r['similarity']:.3f}] {r['session_id']} ({r['step_type']})")
print(f" {r['content'][:150]}")
print()
return rows
# Search for the problematic reasoning pattern
similar = find_similar_reasoning(
"credit score exceeds minimum threshold, recommending approval without review",
agent_id="compliance-agent",
)
Step 7.4: Use Branch Isolation to Re-Run with Corrected Inputs
Once you have identified the issue, create a branch to test corrected behavior without affecting production data:
def rerun_in_branch(agent_id: str, original_session_id: str, corrected_inputs: dict):
"""Re-run an agent task in a branch with corrected inputs."""
branch = client.branches.create(
name=f"debug-{original_session_id}",
agent_id=agent_id,
ttl_hours=4,
)
# Insert corrected reference data into the branch
if "policy_threshold" in corrected_inputs:
client.branches.write(
branch_id=branch.branch_id,
sql=f"""
UPDATE policy_config
SET manual_review_threshold = {corrected_inputs['policy_threshold']}
WHERE policy_name = 'loan_approval'
""",
)
# Query the branch to verify the corrected policy is in place
result = client.branches.query(
branch_id=branch.branch_id,
sql="SELECT * FROM policy_config WHERE policy_name = 'loan_approval'",
)
print(f"Branch {branch.branch_id}: policy threshold set to {result[0]['manual_review_threshold']}")
# The agent can now be re-invoked against this branch
# to verify it produces the correct outcome
return branch
branch = rerun_in_branch(
agent_id="compliance-agent",
original_session_id="loan-review-app-2847",
corrected_inputs={"policy_threshold": 650},
)
print(f"Re-run branch ready: {branch.branch_id}")
print("Point the agent at this branch and re-execute the task.")
Red Flags in CoT Replay
Use this table as a checklist when reviewing CoT traces for problematic behavior:
| Red Flag | What It Looks Like | Likely Cause | Remediation |
|---|---|---|---|
| Excessive tool calls | 10+ consecutive tool_call steps without observation | Agent retrying a failing external call | Add retry limits and circuit breakers in agent logic |
| Circular reasoning | Repeated reasoning steps with near-identical content | Agent stuck in a loop, re-evaluating the same data | Add loop detection (hash deduplication on reasoning content) |
| Missing observations | conclusion immediately after tool_call with no observation | Agent skipping input validation | Enforce observation logging after every tool call |
| Shallow reasoning | reasoning steps with fewer than 20 characters | Agent not actually analyzing inputs | Require minimum content length in CoT logging |
| Hash chain break | trace.chain_valid is False | Data tampering or system error | Investigate immediately; restore from audit backup |
| Abnormal session duration | Session lasting 10x longer than average | Agent blocked on external resource or in retry loop | Set session timeout policies |
| No escalation on edge cases | conclusion reached without escalation for borderline inputs | Agent policy does not enforce escalation thresholds | Update policy rules to require escalation for borderline cases |
Related Concepts
- Chain-of-Thought Ledger -- Full CoT architecture
- Audit Guarantees -- Enterprise audit features
- SOC 2 Architecture -- Compliance controls
- MCP Tools Reference --
log_reasoning_step,replay_decisiontools - Build a Support Agent -- Tutorial with CoT logging