Skip to main content

Tracing Artifacts to Prompts

This tutorial walks through a real forensic debugging scenario: an architecture artifact cost $0.50 across 3 attempts. We trace it back to understand what went wrong with the first two attempts and why the third succeeded.

The Scenario

Your Architect agent produced an api_contract artifact for project proj-abc. The project's cost dashboard shows $0.50 for this single artifact — significantly more than the expected $0.05 for a single DeepSeek V3 call.

Goal: Understand why it took 3 attempts and $0.50 instead of 1 attempt and $0.05.

Step 1: Find the Artifact

Start from the artifact you're investigating:

SELECT
id,
memory_key,
state,
produced_by_agent,
content_hash,
created_at
FROM hd_runtime.artifact_instances
WHERE memory_key = 'proj:abc:api_contract'
ORDER BY created_at DESC
LIMIT 1;

Result:

id:         art-final-001
state: verified
agent: Architect
hash: sha256:d4e5f6...
created_at: 2026-03-28T10:05:00Z

Step 2: Find All Attempts

The artifact links to a task — find all attempts for that task:

SELECT
ta.id AS attempt_id,
ta.status,
ta.failure_kind,
ta.created_at,
ta.completed_at,
md.primary_model,
md.budget_mode,
li.cost_usd,
li.latency_ms,
li.outcome
FROM hd_runtime.task_attempts ta
JOIN hd_runtime.model_decisions md ON md.attempt_id = ta.id
JOIN hd_runtime.llm_invocations li ON li.decision_id = md.id
WHERE ta.task_id = (
SELECT task_id FROM hd_runtime.task_attempts
WHERE id = 'att-final-003' -- from the artifact's attempt
)
ORDER BY ta.created_at;

Result:

Attempt 1: att-001 | terminal_failed  | InvalidOutputSchema | deepseek-v3.2-maas  | $0.003 | 2340ms
Attempt 2: att-002 | terminal_failed | InvalidOutputSchema | claude-4-sonnet | $0.045 | 4100ms
Attempt 3: att-003 | completed_verified| (none) | deepseek-v3.2-maas | $0.003 | 2100ms

Discovery: Attempt 1 failed schema validation, L2 escalated to Claude Sonnet, which also failed, then L1 retried DeepSeek which succeeded. Total: $0.051 across 3 invocations.

Wait — the dashboard showed $0.50. Let's check for hidden invocations.

Step 3: Check All Invocations

SELECT
li.id,
li.provider,
li.model_id,
li.input_tokens,
li.output_tokens,
li.cost_usd,
li.outcome,
li.prompt_hash
FROM hd_runtime.llm_invocations li
WHERE li.decision_id IN (
SELECT id FROM hd_runtime.model_decisions
WHERE attempt_id IN ('att-001', 'att-002', 'att-003')
)
ORDER BY li.called_at;

Result:

inv-001: DeepSeek  | 4200 in, 1800 out | $0.003 | success  | sha256:aaa...
inv-002: DeepSeek | 4200 in, 0 out | $0.001 | error | sha256:aaa... (timeout)
inv-003: Claude | 4200 in, 3200 out | $0.045 | success | sha256:bbb...
inv-004: Claude | 4200 in, 3400 out | $0.048 | success | sha256:bbb... (schema still wrong)
inv-005: DeepSeek | 6800 in, 1800 out | $0.003 | success | sha256:ccc... (new prompt with error context)

Root cause found: There were 5 invocations, not 3. Attempt 1 had 2 invocations (first timed out, second succeeded but schema was wrong). Attempt 2 had 2 invocations (Claude was called twice — the first output also failed validation). Total: $0.003 + $0.001 + $0.045 + $0.048 + $0.003 = $0.10 (closer, but still not $0.50).

Step 4: Check the Prompt Hash

The prompt hash changed between attempts:

AttemptPrompt HashInput Tokens
1sha256:aaa...4,200
2sha256:bbb...4,200
3sha256:ccc...6,800

Attempt 3 had 6,800 input tokens vs 4,200 — the RepairAgent injected failure context into the prompt, increasing the input size by 62%.

Step 5: Use the ExplainBundle

Instead of manual SQL, get the full picture in one call:

bundle = client.explain(attempt_id="att-003")

# Quick summary
print(f"Attempts: {bundle.totals.attempts}")
print(f"Total cost: ${bundle.totals.cost_usd}")
print(f"Model: {bundle.model_decision.primary_model}")

# Recovery chain
for r in bundle.recovery:
print(f" {r.level}: {r.action}{r.failure_kind}")

Step 6: The Fix

Now you know:

  1. DeepSeek V3 fails schema validation for this agent's output format ~50% of the time
  2. Claude Sonnet also fails, suggesting the contract schema is too strict
  3. The successful attempt used an enriched prompt with error context

Actions:

  • Relax the api_contract schema validation (currently requires exact OpenAPI 3.1 structure)
  • Or update the Architect prompt to be more explicit about output format
  • Monitor via: SELECT model_id, success_rate FROM v_reward_signals WHERE task_class = 'AuthoritySpec' AND agent_type = 'architect'

Forensic Query Cheat Sheet

QuestionQuery
"How much did this artifact cost?"Sum llm_invocations.cost_usd for all attempts of the parent task
"Why was this model selected?"Check model_decisions.routing_reason and budget_mode
"What prompt was used?"Match llm_invocations.prompt_hash — same hash = same prompt
"What changed between attempts?"Compare prompt_hash and input_tokens across attempts
"Who approved this?"Check artifact_validations.verifier_type and status
"Was cache used?"Check llm_invocations.cache_statushit saves ~90% on input cost

Next Steps

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.