Skip to main content

CoT Replay & Compliance

In this tutorial you will use HatiData's Chain-of-Thought (CoT) ledger to build a compliance-auditable AI system. Every reasoning step is recorded in an immutable, cryptographically hash-chained ledger that can be replayed and verified at any time.

By the end you will have:

  • Logged reasoning steps across multiple step types
  • Replayed a complete decision session
  • Verified hash chain integrity programmatically
  • Exported audit reports for compliance review

Prerequisites

  • Python 3.10+
  • HatiData proxy running locally or in the cloud
  • hatidata SDK installed
pip install hatidata
export HATIDATA_API_KEY="hd_live_your_api_key"
export HATIDATA_HOST="localhost"

Step 1: Understand the CoT Ledger

The CoT ledger stores each reasoning step as an immutable record with the following properties:

FieldDescription
trace_idUnique ID for the step
session_idGroups steps into a decision session
agent_idThe agent that produced the step
step_typeOne of 12 types (observation, reasoning, tool_call, conclusion, etc.)
contentThe reasoning text
prev_hashCryptographic hash of the previous step in this session
hashCryptographic hash of this step (includes prev_hash, creating a chain)
metadataArbitrary JSON metadata
created_atTimestamp

Each step's hash is computed as:

hash = cryptographic_hash(session_id + step_number + step_type + content + prev_hash)

This creates a tamper-evident chain: modifying any step invalidates all subsequent hashes.


Step 2: Log Reasoning Steps

import os
from hatidata import HatiDataClient
from hatidata.cot import CotClient, StepType

client = HatiDataClient(
host=os.environ["HATIDATA_HOST"],
port=5439,
api_key=os.environ["HATIDATA_API_KEY"],
)
cot = CotClient(client)

AGENT_ID = "compliance-agent"
SESSION_ID = "loan-review-app-2847"

# Step 1: Observation - what the agent sees
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.OBSERVATION,
content="Received loan application #2847. Applicant: John D., amount: $50,000, purpose: business expansion.",
metadata={"application_id": "app-2847", "amount": 50000},
)

# Step 2: Reasoning - agent's analysis
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.REASONING,
content=(
"Evaluating credit score (720), debt-to-income ratio (0.32), and business revenue ($180K/yr). "
"Credit score exceeds minimum threshold of 680. DTI ratio is within acceptable range (<0.40). "
"Business revenue supports the requested loan amount."
),
metadata={"credit_score": 720, "dti_ratio": 0.32},
)

# Step 3: Tool call - agent queries external data
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.TOOL_CALL,
content="Queried risk assessment model. Risk score: 0.12 (low). No fraud indicators detected.",
metadata={"tool": "risk_model_v3", "risk_score": 0.12},
)

# Step 4: Reasoning - weighing factors
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.REASONING,
content=(
"All three criteria (credit score, DTI, risk model) are within approved thresholds. "
"Recommending approval with standard terms. No escalation required."
),
)

# Step 5: Conclusion - final decision
cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.CONCLUSION,
content="APPROVED. Loan application #2847 approved for $50,000 at standard rate. No conditions.",
metadata={"decision": "approved", "conditions": None},
)

print(f"Logged 5 reasoning steps for session: {SESSION_ID}")

Step 3: Replay a Decision Session

Replay the entire session to see every step the agent took in order:

def replay_session(session_id: str):
"""Replay and display a complete decision session."""
trace = cot.replay_session(
agent_id=AGENT_ID,
session_id=session_id,
)

print(f"Session: {session_id}")
print(f"Steps: {len(trace.steps)}")
print(f"Chain: {'VALID' if trace.chain_valid else 'BROKEN'}")
print("-" * 60)

for i, step in enumerate(trace.steps, 1):
print(f"\nStep {i}: [{step.step_type}] at {step.timestamp}")
print(f" Content: {step.content}")
print(f" Hash: {step.hash[:32]}...")
if step.metadata:
print(f" Meta: {step.metadata}")

replay_session(SESSION_ID)

Expected output:

Session: loan-review-app-2847
Steps: 5
Chain: VALID
------------------------------------------------------------

Step 1: [observation] at 2025-12-15T10:30:00Z
Content: Received loan application #2847. Applicant: John D., amount: $50,000...
Hash: a1b2c3d4e5f6789012345678...

Step 2: [reasoning] at 2025-12-15T10:30:01Z
Content: Evaluating credit score (720), debt-to-income ratio (0.32)...
Hash: f8e7d6c5b4a39281726354...
Meta: {'credit_score': 720, 'dti_ratio': 0.32}
...

Step 4: Verify Hash Chain Integrity

Programmatically verify that no steps have been tampered with:

def verify_chain(session_id: str) -> bool:
"""Verify the cryptographic hash chain for a session."""
trace = cot.replay_session(
agent_id=AGENT_ID,
session_id=session_id,
)

if not trace.chain_valid:
print(f"CHAIN BROKEN at step {trace.break_index + 1}")
broken_step = trace.steps[trace.break_index]
print(f" Expected hash: {trace.expected_hash}")
print(f" Actual hash: {broken_step.hash}")
return False

print(f"Chain verified: {len(trace.steps)} steps, all hashes valid.")
return True

is_valid = verify_chain(SESSION_ID)

The CoT ledger table (_hatidata_cot) has an append-only enforcer that blocks UPDATE, DELETE, TRUNCATE, and DROP operations. Even if someone bypasses the enforcer and modifies a row, the hash chain verification will detect the tampering.


Step 5: Export Audit Reports

Generate compliance reports by querying the CoT table directly:

def export_audit_report(session_id: str) -> dict:
"""Export a structured audit report for a decision session."""
trace = cot.replay_session(
agent_id=AGENT_ID,
session_id=session_id,
)

report = {
"session_id": session_id,
"agent_id": AGENT_ID,
"total_steps": len(trace.steps),
"chain_integrity": "VALID" if trace.chain_valid else "BROKEN",
"first_step_at": trace.steps[0].timestamp if trace.steps else None,
"last_step_at": trace.steps[-1].timestamp if trace.steps else None,
"decision": None,
"steps": [],
}

for step in trace.steps:
report["steps"].append({
"step_number": step.step_number,
"type": step.step_type,
"content": step.content,
"hash": step.hash,
"timestamp": step.timestamp,
"metadata": step.metadata,
})
if step.step_type == "conclusion":
report["decision"] = step.content

return report

report = export_audit_report(SESSION_ID)
print(f"Decision: {report['decision']}")
print(f"Integrity: {report['chain_integrity']}")

SQL-Based Audit Queries

-- All decisions made by an agent in the last 30 days
SELECT session_id, content AS decision, created_at
FROM _hatidata_cot
WHERE agent_id = 'compliance-agent'
AND step_type = 'conclusion'
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY created_at DESC;

-- Sessions with more than 10 reasoning steps (complex decisions)
SELECT session_id, COUNT(*) AS step_count
FROM _hatidata_cot
WHERE agent_id = 'compliance-agent'
GROUP BY session_id
HAVING COUNT(*) > 10
ORDER BY step_count DESC;

-- All tool calls made during a specific session
SELECT step_number, content, metadata, created_at
FROM _hatidata_cot
WHERE session_id = 'loan-review-app-2847'
AND step_type = 'tool_call'
ORDER BY step_number;

Step 6: Compliance Best Practices

Always Log Critical Step Types

For compliance-sensitive agents, ensure these step types are always logged:

Step TypeWhen to Log
OBSERVATIONEvery external input the agent receives
REASONINGEvery evaluation or analysis the agent performs
TOOL_CALLEvery external system the agent queries
CONCLUSIONEvery decision or output the agent produces
ESCALATIONWhenever the agent defers to a human

Embedding Sampling

The CoT ledger supports configurable embedding sampling. For compliance, set the sampling rate to 100% on critical step types so that all conclusions and escalations are semantically searchable:

cot.log_step(
agent_id=AGENT_ID,
session_id=SESSION_ID,
step_type=StepType.CONCLUSION,
content="DENIED. Application does not meet minimum credit requirements.",
metadata={"force_embed": True}, # Always embed conclusions
)

Step 7: Debugging Rogue Agents with CoT Replay

When an agent starts making unexpected decisions -- hallucinating facts, ignoring policy constraints, or looping without progress -- the CoT ledger is your primary diagnostic tool. This section walks through a systematic debugging workflow.

Scenario

Your compliance-agent has been approving loan applications that should have been flagged for manual review. You need to find out where the reasoning went wrong, whether it is an isolated incident, and how to prevent recurrence.

Step 7.1: List Recent Sessions, Sorted by Step Count

Sessions with unusually high step counts often indicate circular reasoning or repeated tool calls. Start by identifying outliers:

def find_suspicious_sessions(agent_id: str, lookback_days: int = 7, min_steps: int = 15):
"""List recent sessions sorted by step count to find outliers."""
rows = client.query(f"""
SELECT
session_id,
COUNT(*) AS step_count,
MIN(created_at) AS started_at,
MAX(created_at) AS ended_at,
EXTRACT(EPOCH FROM (MAX(created_at) - MIN(created_at))) AS duration_secs,
COUNT(*) FILTER (WHERE step_type = 'tool_call') AS tool_calls,
COUNT(*) FILTER (WHERE step_type = 'reasoning') AS reasoning_steps
FROM _hatidata_cot
WHERE agent_id = '{agent_id}'
AND created_at > NOW() - INTERVAL '{lookback_days} days'
GROUP BY session_id
HAVING COUNT(*) >= {min_steps}
ORDER BY step_count DESC
LIMIT 20
""")

print(f"{'Session ID':<30} {'Steps':>6} {'Tools':>6} {'Reason':>7} {'Duration':>10}")
print("-" * 65)
for r in rows:
dur = f"{r['duration_secs']:.0f}s" if r['duration_secs'] else "N/A"
print(f"{r['session_id']:<30} {r['step_count']:>6} {r['tool_calls']:>6} "
f"{r['reasoning_steps']:>7} {dur:>10}")

return rows

suspicious = find_suspicious_sessions("compliance-agent")

Step 7.2: Replay the Suspicious Session and Identify Divergence

Once you have identified a session with an abnormal step count or unexpected outcome, replay it and look for the exact step where reasoning diverged:

def debug_session(agent_id: str, session_id: str):
"""Replay a session and flag potential issues at each step."""
trace = cot.replay_session(agent_id=agent_id, session_id=session_id)

print(f"Session: {session_id}")
print(f"Steps: {len(trace.steps)}")
print(f"Chain: {'VALID' if trace.chain_valid else 'BROKEN'}")
print("=" * 70)

prev_step_type = None
consecutive_same = 0

for i, step in enumerate(trace.steps, 1):
flags = []

# Flag: consecutive steps of the same type (potential loop)
if step.step_type == prev_step_type:
consecutive_same += 1
if consecutive_same >= 3:
flags.append("LOOP_DETECTED")
else:
consecutive_same = 0

# Flag: tool_call without a subsequent observation
if prev_step_type == "tool_call" and step.step_type == "tool_call":
flags.append("TOOL_WITHOUT_OBSERVATION")

# Flag: conclusion without reasoning
if step.step_type == "conclusion" and prev_step_type == "observation":
flags.append("NO_REASONING_BEFORE_CONCLUSION")

# Flag: very short reasoning content
if step.step_type == "reasoning" and len(step.content) < 20:
flags.append("SHALLOW_REASONING")

flag_str = f" ** {', '.join(flags)} **" if flags else ""
print(f"\nStep {i}: [{step.step_type}]{flag_str}")
print(f" {step.content[:200]}")

prev_step_type = step.step_type

debug_session("compliance-agent", suspicious[0]["session_id"])

Step 7.3: Search CoT by Semantic Similarity

Find similar reasoning patterns across other sessions. This helps determine whether the issue is systemic or a one-off:

def find_similar_reasoning(query_text: str, agent_id: str, limit: int = 10):
"""Search CoT steps by semantic similarity to find recurring patterns."""
rows = client.query(f"""
SELECT
session_id,
step_type,
content,
semantic_match(content, '{query_text}') AS similarity,
created_at
FROM _hatidata_cot
WHERE agent_id = '{agent_id}'
AND step_type IN ('reasoning', 'conclusion')
ORDER BY semantic_rank(content, '{query_text}')
LIMIT {limit}
""")

for r in rows:
print(f"[{r['similarity']:.3f}] {r['session_id']} ({r['step_type']})")
print(f" {r['content'][:150]}")
print()

return rows

# Search for the problematic reasoning pattern
similar = find_similar_reasoning(
"credit score exceeds minimum threshold, recommending approval without review",
agent_id="compliance-agent",
)

Step 7.4: Use Branch Isolation to Re-Run with Corrected Inputs

Once you have identified the issue, create a branch to test corrected behavior without affecting production data:

def rerun_in_branch(agent_id: str, original_session_id: str, corrected_inputs: dict):
"""Re-run an agent task in a branch with corrected inputs."""
branch = client.branches.create(
name=f"debug-{original_session_id}",
agent_id=agent_id,
ttl_hours=4,
)

# Insert corrected reference data into the branch
if "policy_threshold" in corrected_inputs:
client.branches.write(
branch_id=branch.branch_id,
sql=f"""
UPDATE policy_config
SET manual_review_threshold = {corrected_inputs['policy_threshold']}
WHERE policy_name = 'loan_approval'
""",
)

# Query the branch to verify the corrected policy is in place
result = client.branches.query(
branch_id=branch.branch_id,
sql="SELECT * FROM policy_config WHERE policy_name = 'loan_approval'",
)
print(f"Branch {branch.branch_id}: policy threshold set to {result[0]['manual_review_threshold']}")

# The agent can now be re-invoked against this branch
# to verify it produces the correct outcome
return branch

branch = rerun_in_branch(
agent_id="compliance-agent",
original_session_id="loan-review-app-2847",
corrected_inputs={"policy_threshold": 650},
)
print(f"Re-run branch ready: {branch.branch_id}")
print("Point the agent at this branch and re-execute the task.")

Red Flags in CoT Replay

Use this table as a checklist when reviewing CoT traces for problematic behavior:

Red FlagWhat It Looks LikeLikely CauseRemediation
Excessive tool calls10+ consecutive tool_call steps without observationAgent retrying a failing external callAdd retry limits and circuit breakers in agent logic
Circular reasoningRepeated reasoning steps with near-identical contentAgent stuck in a loop, re-evaluating the same dataAdd loop detection (hash deduplication on reasoning content)
Missing observationsconclusion immediately after tool_call with no observationAgent skipping input validationEnforce observation logging after every tool call
Shallow reasoningreasoning steps with fewer than 20 charactersAgent not actually analyzing inputsRequire minimum content length in CoT logging
Hash chain breaktrace.chain_valid is FalseData tampering or system errorInvestigate immediately; restore from audit backup
Abnormal session durationSession lasting 10x longer than averageAgent blocked on external resource or in retry loopSet session timeout policies
No escalation on edge casesconclusion reached without escalation for borderline inputsAgent policy does not enforce escalation thresholdsUpdate policy rules to require escalation for borderline cases

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.