Lineage & Explainability

Every artifact in HatiData V2 has a complete provenance chain — from the prompt that created it, through the model that generated it, to the verification that approved it. No orphan artifacts are permitted.

The Lineage Chain

Task
 └── Attempt
      ├── ModelDecision (which model was selected and why)
      │    └── LlmInvocation (the actual API call: tokens, cost, latency)
      ├── ArtifactInstance (what was produced)
      │    └── ArtifactValidation (schema + contract verification)
      └── WorkflowEvent (audit trail of state transitions)

Every entity links back to its parent via foreign keys. Given any artifact, you can answer:

What prompt created it? Follow artifact → attempt → llm_invocation → prompt_hash
Why this model? Follow attempt → model_decision → routing_policy + budget_mode
Who approved it? Follow artifact → validation → verifier_type + evidence
What did it cost? Sum llm_invocations.cost_usd for the attempt chain

Model Decisions

When an agent runs, the LLM router makes a model decision — which model to use, from which provider, at what capability class. This decision is recorded before the LLM call, not after.

Field	Description
`task_class`	What the task needs (e.g., `Codegen`, `AuthoritySpec`)
`primary_model`	Model selected by the routing policy
`fallback_chain`	Ordered list of fallback models
`capability_class`	Required capability level (e.g., `StrongGeneral`)
`budget_mode`	Budget state at decision time (`normal`, `warning`, `constrained`)
`routing_reason`	Why this model was chosen (policy match, budget downgrade, failover)

Decisions Are Immutable

Model decisions are append-only. If a retry selects a different model (e.g., due to budget downgrade), a new ModelDecision is created — the original is never modified.

LLM Invocations

Each model decision leads to one or more LLM invocations — the actual HTTP calls to the provider.

Field	Description
`provider`	Which provider was called (Claude, Gemini, DeepSeek, Qwen)
`model_id`	Exact model ID (e.g., `deepseek-ai/deepseek-v3.2-maas`)
`input_tokens`	Tokens sent to the model
`output_tokens`	Tokens received from the model
`cached_tokens`	Tokens served from prompt cache (90% discount)
`cost_usd`	Computed cost for this invocation
`latency_ms`	Time from request to response
`prompt_hash`	SHA-256 hash of the full prompt (for reproducibility)
`outcome`	`success`, `fallback`, `error`

Artifact Lineage

Every artifact instance tracks its complete lineage:

SELECT
    a.memory_key,
    a.artifact_kind,
    a.state,
    a.produced_by_agent,
    a.produced_in_run_id,
    a.content_hash,
    v.verifier_type,
    v.status AS verification_status
FROM hd_runtime.artifact_instances a
LEFT JOIN hd_runtime.artifact_validations v
    ON v.artifact_id = a.id
WHERE a.project_id = :project_id
ORDER BY a.created_at;

Artifact States

State	Meaning	Consumable?
`declared`	Agent contract says this will be produced	No
`generated`	Agent wrote output to memory	No
`schema_valid`	Typed validator confirmed schema	Yes
`contract_valid`	All contract constraints satisfied	Yes
`verified`	Downstream verifier approved	Yes
`pinned`	Gate policy locked as authoritative version	Yes
`rejected`	Validation or verifier failed	No
`superseded`	Newer version accepted	No

The "No Orphan" Guarantee

HatiData V2 enforces referential integrity on artifacts:

Every ArtifactInstance must reference a valid TaskAttempt
Every TaskAttempt must reference a valid Task
Every Task must reference a valid Project

If any link in this chain is broken, the lineage query returns an explicit error rather than silently returning partial data.

Workflow Events

Every state transition is recorded as an immutable workflow event:

SELECT event_kind, entity_type, entity_id, payload, created_at
FROM hd_runtime.workflow_events
WHERE project_id = :project_id
ORDER BY sequence_num;

Events include:

task_dispatched — Task moved to queue
attempt_started — Agent claimed the task
model_decided — LLM routing decision made
artifact_validated — Schema validation passed
gate_evaluated — Gate policy checked
recovery_initiated — Failure escalated to next recovery level

Workflow Events Are Append-Only

The workflow_events table has REVOKE UPDATE, DELETE — events can never be modified or deleted. This provides a tamper-proof audit trail for compliance (SOC 2, GDPR Article 30).

The ExplainBundle

The /v2/explain/:attempt_id endpoint returns a complete ExplainBundle — a single JSON document containing the full lineage for an attempt:

{
  "attempt_id": "att-001",
  "task": { "id": "task-001", "agent_type": "architect", "task_class": "AuthoritySpec" },
  "model_decision": {
    "primary_model": "deepseek-ai/deepseek-v3.2-maas",
    "capability_class": "StrongGeneral",
    "budget_mode": "normal",
    "routing_reason": "policy_match"
  },
  "invocations": [
    {
      "provider": "Deepseek",
      "model_id": "deepseek-ai/deepseek-v3.2-maas",
      "input_tokens": 4200,
      "output_tokens": 1800,
      "cost_usd": 0.0031,
      "latency_ms": 2340,
      "prompt_hash": "sha256:a1b2c3...",
      "outcome": "success"
    }
  ],
  "artifacts": [
    {
      "memory_key": "proj:abc:api_contract",
      "state": "verified",
      "content_hash": "sha256:d4e5f6...",
      "verification": { "verifier_type": "schema_validator", "status": "passed" }
    }
  ],
  "events": [
    { "kind": "attempt_started", "at": "2026-03-28T10:00:00Z" },
    { "kind": "model_decided", "at": "2026-03-28T10:00:01Z" },
    { "kind": "artifact_validated", "at": "2026-03-28T10:00:05Z" }
  ],
  "total_cost_usd": 0.0031,
  "total_duration_ms": 5200
}

Next Steps

Tracing Artifacts to Prompts — Step-by-step forensic debugging tutorial
Tasks & Attempts — The lifecycle model
Branching & Isolation — How lineage works across branches

The Lineage Chain​

Model Decisions​

LLM Invocations​

Artifact Lineage​

Artifact States​

The "No Orphan" Guarantee​

Workflow Events​

The ExplainBundle​

Next Steps​

Stay in the loop