CoT Replay & Compliance

In this tutorial you will use HatiData's Chain-of-Thought (CoT) ledger to build a compliance-auditable AI system. Every reasoning step is recorded in an immutable, cryptographically hash-chained ledger that can be replayed and verified at any time.

By the end you will have:

Logged reasoning steps across multiple step types
Replayed a complete decision session
Verified hash chain integrity programmatically
Exported audit reports for compliance review

Prerequisites

Python 3.10+
HatiData proxy running locally or in the cloud
hatidata SDK installed

pip install hatidata

export HATIDATA_API_KEY="hd_live_your_api_key"
export HATIDATA_HOST="localhost"

Step 1: Understand the CoT Ledger

The CoT ledger stores each reasoning step as an immutable record with the following properties:

Field	Description
`trace_id`	Unique ID for the step
`session_id`	Groups steps into a decision session
`agent_id`	The agent that produced the step
`step_type`	One of 12 types (observation, reasoning, tool_call, conclusion, etc.)
`content`	The reasoning text
`prev_hash`	Cryptographic hash of the previous step in this session
`hash`	Cryptographic hash of this step (includes prev_hash, creating a chain)
`metadata`	Arbitrary JSON metadata
`created_at`	Timestamp

Each step's hash is computed as:

hash = cryptographic_hash(session_id + step_number + step_type + content + prev_hash)

This creates a tamper-evident chain: modifying any step invalidates all subsequent hashes.

Step 2: Log Reasoning Steps

import os
from hatidata import HatiDataClient
from hatidata.cot import CotClient, StepType

client = HatiDataClient(
    host=os.environ["HATIDATA_HOST"],
    port=5439,
    api_key=os.environ["HATIDATA_API_KEY"],
)
cot = CotClient(client)

AGENT_ID = "compliance-agent"
SESSION_ID = "loan-review-app-2847"

# Step 1: Observation - what the agent sees
cot.log_step(
    agent_id=AGENT_ID,
    session_id=SESSION_ID,
    step_type=StepType.OBSERVATION,
    content="Received loan application #2847. Applicant: John D., amount: $50,000, purpose: business expansion.",
    metadata={"application_id": "app-2847", "amount": 50000},
)

# Step 2: Reasoning - agent's analysis
cot.log_step(
    agent_id=AGENT_ID,
    session_id=SESSION_ID,
    step_type=StepType.REASONING,
    content=(
        "Evaluating credit score (720), debt-to-income ratio (0.32), and business revenue ($180K/yr). "
        "Credit score exceeds minimum threshold of 680. DTI ratio is within acceptable range (<0.40). "
        "Business revenue supports the requested loan amount."
    ),
    metadata={"credit_score": 720, "dti_ratio": 0.32},
)

# Step 3: Tool call - agent queries external data
cot.log_step(
    agent_id=AGENT_ID,
    session_id=SESSION_ID,
    step_type=StepType.TOOL_CALL,
    content="Queried risk assessment model. Risk score: 0.12 (low). No fraud indicators detected.",
    metadata={"tool": "risk_model_v3", "risk_score": 0.12},
)

# Step 4: Reasoning - weighing factors
cot.log_step(
    agent_id=AGENT_ID,
    session_id=SESSION_ID,
    step_type=StepType.REASONING,
    content=(
        "All three criteria (credit score, DTI, risk model) are within approved thresholds. "
        "Recommending approval with standard terms. No escalation required."
    ),
)

# Step 5: Conclusion - final decision
cot.log_step(
    agent_id=AGENT_ID,
    session_id=SESSION_ID,
    step_type=StepType.CONCLUSION,
    content="APPROVED. Loan application #2847 approved for $50,000 at standard rate. No conditions.",
    metadata={"decision": "approved", "conditions": None},
)

print(f"Logged 5 reasoning steps for session: {SESSION_ID}")

Step 3: Replay a Decision Session

Replay the entire session to see every step the agent took in order:

def replay_session(session_id: str):
    """Replay and display a complete decision session."""
    trace = cot.replay_session(
        agent_id=AGENT_ID,
        session_id=session_id,
    )

    print(f"Session: {session_id}")
    print(f"Steps:   {len(trace.steps)}")
    print(f"Chain:   {'VALID' if trace.chain_valid else 'BROKEN'}")
    print("-" * 60)

    for i, step in enumerate(trace.steps, 1):
        print(f"\nStep {i}: [{step.step_type}] at {step.timestamp}")
        print(f"  Content: {step.content}")
        print(f"  Hash:    {step.hash[:32]}...")
        if step.metadata:
            print(f"  Meta:    {step.metadata}")

replay_session(SESSION_ID)

Expected output:

Session: loan-review-app-2847
Steps:   5
Chain:   VALID
------------------------------------------------------------

Step 1: [observation] at 2025-12-15T10:30:00Z
  Content: Received loan application #2847. Applicant: John D., amount: $50,000...
  Hash:    a1b2c3d4e5f6789012345678...

Step 2: [reasoning] at 2025-12-15T10:30:01Z
  Content: Evaluating credit score (720), debt-to-income ratio (0.32)...
  Hash:    f8e7d6c5b4a39281726354...
  Meta:    {'credit_score': 720, 'dti_ratio': 0.32}
...

Step 4: Verify Hash Chain Integrity

Programmatically verify that no steps have been tampered with:

def verify_chain(session_id: str) -> bool:
    """Verify the cryptographic hash chain for a session."""
    trace = cot.replay_session(
        agent_id=AGENT_ID,
        session_id=session_id,
    )

    if not trace.chain_valid:
        print(f"CHAIN BROKEN at step {trace.break_index + 1}")
        broken_step = trace.steps[trace.break_index]
        print(f"  Expected hash: {trace.expected_hash}")
        print(f"  Actual hash:   {broken_step.hash}")
        return False

    print(f"Chain verified: {len(trace.steps)} steps, all hashes valid.")
    return True

is_valid = verify_chain(SESSION_ID)

The CoT ledger table (_hatidata_cot) has an append-only enforcer that blocks UPDATE, DELETE, TRUNCATE, and DROP operations. Even if someone bypasses the enforcer and modifies a row, the hash chain verification will detect the tampering.

Step 5: Export Audit Reports

Generate compliance reports by querying the CoT table directly:

def export_audit_report(session_id: str) -> dict:
    """Export a structured audit report for a decision session."""
    trace = cot.replay_session(
        agent_id=AGENT_ID,
        session_id=session_id,
    )

    report = {
        "session_id": session_id,
        "agent_id": AGENT_ID,
        "total_steps": len(trace.steps),
        "chain_integrity": "VALID" if trace.chain_valid else "BROKEN",
        "first_step_at": trace.steps[0].timestamp if trace.steps else None,
        "last_step_at": trace.steps[-1].timestamp if trace.steps else None,
        "decision": None,
        "steps": [],
    }

    for step in trace.steps:
        report["steps"].append({
            "step_number": step.step_number,
            "type": step.step_type,
            "content": step.content,
            "hash": step.hash,
            "timestamp": step.timestamp,
            "metadata": step.metadata,
        })
        if step.step_type == "conclusion":
            report["decision"] = step.content

    return report

report = export_audit_report(SESSION_ID)
print(f"Decision: {report['decision']}")
print(f"Integrity: {report['chain_integrity']}")

SQL-Based Audit Queries

-- All decisions made by an agent in the last 30 days
SELECT session_id, content AS decision, created_at
FROM _hatidata_cot
WHERE agent_id = 'compliance-agent'
  AND step_type = 'conclusion'
  AND created_at > NOW() - INTERVAL '30 days'
ORDER BY created_at DESC;

-- Sessions with more than 10 reasoning steps (complex decisions)
SELECT session_id, COUNT(*) AS step_count
FROM _hatidata_cot
WHERE agent_id = 'compliance-agent'
GROUP BY session_id
HAVING COUNT(*) > 10
ORDER BY step_count DESC;

-- All tool calls made during a specific session
SELECT step_number, content, metadata, created_at
FROM _hatidata_cot
WHERE session_id = 'loan-review-app-2847'
  AND step_type = 'tool_call'
ORDER BY step_number;

Step 6: Compliance Best Practices

Always Log Critical Step Types

For compliance-sensitive agents, ensure these step types are always logged:

Step Type	When to Log
`OBSERVATION`	Every external input the agent receives
`REASONING`	Every evaluation or analysis the agent performs
`TOOL_CALL`	Every external system the agent queries
`CONCLUSION`	Every decision or output the agent produces
`ESCALATION`	Whenever the agent defers to a human

Embedding Sampling

The CoT ledger supports configurable embedding sampling. For compliance, set the sampling rate to 100% on critical step types so that all conclusions and escalations are semantically searchable:

cot.log_step(
    agent_id=AGENT_ID,
    session_id=SESSION_ID,
    step_type=StepType.CONCLUSION,
    content="DENIED. Application does not meet minimum credit requirements.",
    metadata={"force_embed": True},  # Always embed conclusions
)

Step 7: Debugging Rogue Agents with CoT Replay

When an agent starts making unexpected decisions -- hallucinating facts, ignoring policy constraints, or looping without progress -- the CoT ledger is your primary diagnostic tool. This section walks through a systematic debugging workflow.

Scenario

Your compliance-agent has been approving loan applications that should have been flagged for manual review. You need to find out where the reasoning went wrong, whether it is an isolated incident, and how to prevent recurrence.

Step 7.1: List Recent Sessions, Sorted by Step Count

Sessions with unusually high step counts often indicate circular reasoning or repeated tool calls. Start by identifying outliers:

def find_suspicious_sessions(agent_id: str, lookback_days: int = 7, min_steps: int = 15):
    """List recent sessions sorted by step count to find outliers."""
    rows = client.query(f"""
        SELECT
            session_id,
            COUNT(*) AS step_count,
            MIN(created_at) AS started_at,
            MAX(created_at) AS ended_at,
            EXTRACT(EPOCH FROM (MAX(created_at) - MIN(created_at))) AS duration_secs,
            COUNT(*) FILTER (WHERE step_type = 'tool_call') AS tool_calls,
            COUNT(*) FILTER (WHERE step_type = 'reasoning') AS reasoning_steps
        FROM _hatidata_cot
        WHERE agent_id = '{agent_id}'
          AND created_at > NOW() - INTERVAL '{lookback_days} days'
        GROUP BY session_id
        HAVING COUNT(*) >= {min_steps}
        ORDER BY step_count DESC
        LIMIT 20
    """)

    print(f"{'Session ID':<30} {'Steps':>6} {'Tools':>6} {'Reason':>7} {'Duration':>10}")
    print("-" * 65)
    for r in rows:
        dur = f"{r['duration_secs']:.0f}s" if r['duration_secs'] else "N/A"
        print(f"{r['session_id']:<30} {r['step_count']:>6} {r['tool_calls']:>6} "
              f"{r['reasoning_steps']:>7} {dur:>10}")

    return rows

suspicious = find_suspicious_sessions("compliance-agent")

Step 7.2: Replay the Suspicious Session and Identify Divergence

Once you have identified a session with an abnormal step count or unexpected outcome, replay it and look for the exact step where reasoning diverged:

def debug_session(agent_id: str, session_id: str):
    """Replay a session and flag potential issues at each step."""
    trace = cot.replay_session(agent_id=agent_id, session_id=session_id)

    print(f"Session: {session_id}")
    print(f"Steps:   {len(trace.steps)}")
    print(f"Chain:   {'VALID' if trace.chain_valid else 'BROKEN'}")
    print("=" * 70)

    prev_step_type = None
    consecutive_same = 0

    for i, step in enumerate(trace.steps, 1):
        flags = []

        # Flag: consecutive steps of the same type (potential loop)
        if step.step_type == prev_step_type:
            consecutive_same += 1
            if consecutive_same >= 3:
                flags.append("LOOP_DETECTED")
        else:
            consecutive_same = 0

        # Flag: tool_call without a subsequent observation
        if prev_step_type == "tool_call" and step.step_type == "tool_call":
            flags.append("TOOL_WITHOUT_OBSERVATION")

        # Flag: conclusion without reasoning
        if step.step_type == "conclusion" and prev_step_type == "observation":
            flags.append("NO_REASONING_BEFORE_CONCLUSION")

        # Flag: very short reasoning content
        if step.step_type == "reasoning" and len(step.content) < 20:
            flags.append("SHALLOW_REASONING")

        flag_str = f"  ** {', '.join(flags)} **" if flags else ""
        print(f"\nStep {i}: [{step.step_type}]{flag_str}")
        print(f"  {step.content[:200]}")

        prev_step_type = step.step_type

debug_session("compliance-agent", suspicious[0]["session_id"])

Step 7.3: Search CoT by Semantic Similarity

Find similar reasoning patterns across other sessions. This helps determine whether the issue is systemic or a one-off:

def find_similar_reasoning(query_text: str, agent_id: str, limit: int = 10):
    """Search CoT steps by semantic similarity to find recurring patterns."""
    rows = client.query(f"""
        SELECT
            session_id,
            step_type,
            content,
            semantic_match(content, '{query_text}') AS similarity,
            created_at
        FROM _hatidata_cot
        WHERE agent_id = '{agent_id}'
          AND step_type IN ('reasoning', 'conclusion')
        ORDER BY semantic_rank(content, '{query_text}')
        LIMIT {limit}
    """)

    for r in rows:
        print(f"[{r['similarity']:.3f}] {r['session_id']} ({r['step_type']})")
        print(f"  {r['content'][:150]}")
        print()

    return rows

# Search for the problematic reasoning pattern
similar = find_similar_reasoning(
    "credit score exceeds minimum threshold, recommending approval without review",
    agent_id="compliance-agent",
)

Step 7.4: Use Branch Isolation to Re-Run with Corrected Inputs

Once you have identified the issue, create a branch to test corrected behavior without affecting production data:

def rerun_in_branch(agent_id: str, original_session_id: str, corrected_inputs: dict):
    """Re-run an agent task in a branch with corrected inputs."""
    branch = client.branches.create(
        name=f"debug-{original_session_id}",
        agent_id=agent_id,
        ttl_hours=4,
    )

    # Insert corrected reference data into the branch
    if "policy_threshold" in corrected_inputs:
        client.branches.write(
            branch_id=branch.branch_id,
            sql=f"""
                UPDATE policy_config
                SET manual_review_threshold = {corrected_inputs['policy_threshold']}
                WHERE policy_name = 'loan_approval'
            """,
        )

    # Query the branch to verify the corrected policy is in place
    result = client.branches.query(
        branch_id=branch.branch_id,
        sql="SELECT * FROM policy_config WHERE policy_name = 'loan_approval'",
    )
    print(f"Branch {branch.branch_id}: policy threshold set to {result[0]['manual_review_threshold']}")

    # The agent can now be re-invoked against this branch
    # to verify it produces the correct outcome
    return branch

branch = rerun_in_branch(
    agent_id="compliance-agent",
    original_session_id="loan-review-app-2847",
    corrected_inputs={"policy_threshold": 650},
)
print(f"Re-run branch ready: {branch.branch_id}")
print("Point the agent at this branch and re-execute the task.")

Red Flags in CoT Replay

Use this table as a checklist when reviewing CoT traces for problematic behavior:

Red Flag	What It Looks Like	Likely Cause	Remediation
Excessive tool calls	10+ consecutive `tool_call` steps without `observation`	Agent retrying a failing external call	Add retry limits and circuit breakers in agent logic
Circular reasoning	Repeated `reasoning` steps with near-identical content	Agent stuck in a loop, re-evaluating the same data	Add loop detection (hash deduplication on reasoning content)
Missing observations	`conclusion` immediately after `tool_call` with no `observation`	Agent skipping input validation	Enforce observation logging after every tool call
Shallow reasoning	`reasoning` steps with fewer than 20 characters	Agent not actually analyzing inputs	Require minimum content length in CoT logging
Hash chain break	`trace.chain_valid` is `False`	Data tampering or system error	Investigate immediately; restore from audit backup
Abnormal session duration	Session lasting 10x longer than average	Agent blocked on external resource or in retry loop	Set session timeout policies
No escalation on edge cases	`conclusion` reached without `escalation` for borderline inputs	Agent policy does not enforce escalation thresholds	Update policy rules to require escalation for borderline cases

Chain-of-Thought Ledger -- Full CoT architecture
Audit Guarantees -- Enterprise audit features
SOC 2 Architecture -- Compliance controls
MCP Tools Reference -- log_reasoning_step, replay_decision tools
Build a Support Agent -- Tutorial with CoT logging

Prerequisites​

Step 1: Understand the CoT Ledger​

Step 2: Log Reasoning Steps​

Step 3: Replay a Decision Session​

Step 4: Verify Hash Chain Integrity​

Step 5: Export Audit Reports​

SQL-Based Audit Queries​

Step 6: Compliance Best Practices​

Always Log Critical Step Types​

Embedding Sampling​

Step 7: Debugging Rogue Agents with CoT Replay​

Scenario​

Step 7.1: List Recent Sessions, Sorted by Step Count​

Step 7.2: Replay the Suspicious Session and Identify Divergence​

Step 7.3: Search CoT by Semantic Similarity​

Step 7.4: Use Branch Isolation to Re-Run with Corrected Inputs​

Red Flags in CoT Replay​

Related Concepts​

Stay in the loop