Lineage & Explainability
Every artifact in HatiData V2 has a complete provenance chain — from the prompt that created it, through the model that generated it, to the verification that approved it. No orphan artifacts are permitted.
The Lineage Chain
Task
└── Attempt
├── ModelDecision (which model was selected and why)
│ └── LlmInvocation (the actual API call: tokens, cost, latency)
├── ArtifactInstance (what was produced)
│ └── ArtifactValidation (schema + contract verification)
└── WorkflowEvent (audit trail of state transitions)
Every entity links back to its parent via foreign keys. Given any artifact, you can answer:
- What prompt created it? Follow
artifact → attempt → llm_invocation → prompt_hash - Why this model? Follow
attempt → model_decision → routing_policy + budget_mode - Who approved it? Follow
artifact → validation → verifier_type + evidence - What did it cost? Sum
llm_invocations.cost_usdfor the attempt chain
Model Decisions
When an agent runs, the LLM router makes a model decision — which model to use, from which provider, at what capability class. This decision is recorded before the LLM call, not after.
| Field | Description |
|---|---|
task_class | What the task needs (e.g., Codegen, AuthoritySpec) |
primary_model | Model selected by the routing policy |
fallback_chain | Ordered list of fallback models |
capability_class | Required capability level (e.g., StrongGeneral) |
budget_mode | Budget state at decision time (normal, warning, constrained) |
routing_reason | Why this model was chosen (policy match, budget downgrade, failover) |
Model decisions are append-only. If a retry selects a different model (e.g., due to budget downgrade), a new ModelDecision is created — the original is never modified.
LLM Invocations
Each model decision leads to one or more LLM invocations — the actual HTTP calls to the provider.
| Field | Description |
|---|---|
provider | Which provider was called (Claude, Gemini, DeepSeek, Qwen) |
model_id | Exact model ID (e.g., deepseek-ai/deepseek-v3.2-maas) |
input_tokens | Tokens sent to the model |
output_tokens | Tokens received from the model |
cached_tokens | Tokens served from prompt cache (90% discount) |
cost_usd | Computed cost for this invocation |
latency_ms | Time from request to response |
prompt_hash | SHA-256 hash of the full prompt (for reproducibility) |
outcome | success, fallback, error |
Artifact Lineage
Every artifact instance tracks its complete lineage:
SELECT
a.memory_key,
a.artifact_kind,
a.state,
a.produced_by_agent,
a.produced_in_run_id,
a.content_hash,
v.verifier_type,
v.status AS verification_status
FROM hd_runtime.artifact_instances a
LEFT JOIN hd_runtime.artifact_validations v
ON v.artifact_id = a.id
WHERE a.project_id = :project_id
ORDER BY a.created_at;
Artifact States
| State | Meaning | Consumable? |
|---|---|---|
declared | Agent contract says this will be produced | No |
generated | Agent wrote output to memory | No |
schema_valid | Typed validator confirmed schema | Yes |
contract_valid | All contract constraints satisfied | Yes |
verified | Downstream verifier approved | Yes |
pinned | Gate policy locked as authoritative version | Yes |
rejected | Validation or verifier failed | No |
superseded | Newer version accepted | No |
The "No Orphan" Guarantee
HatiData V2 enforces referential integrity on artifacts:
- Every
ArtifactInstancemust reference a validTaskAttempt - Every
TaskAttemptmust reference a validTask - Every
Taskmust reference a validProject
If any link in this chain is broken, the lineage query returns an explicit error rather than silently returning partial data.
Workflow Events
Every state transition is recorded as an immutable workflow event:
SELECT event_kind, entity_type, entity_id, payload, created_at
FROM hd_runtime.workflow_events
WHERE project_id = :project_id
ORDER BY sequence_num;
Events include:
task_dispatched— Task moved to queueattempt_started— Agent claimed the taskmodel_decided— LLM routing decision madeartifact_validated— Schema validation passedgate_evaluated— Gate policy checkedrecovery_initiated— Failure escalated to next recovery level
The workflow_events table has REVOKE UPDATE, DELETE — events can never be modified or deleted. This provides a tamper-proof audit trail for compliance (SOC 2, GDPR Article 30).
The ExplainBundle
The /v2/explain/:attempt_id endpoint returns a complete ExplainBundle — a single JSON document containing the full lineage for an attempt:
{
"attempt_id": "att-001",
"task": { "id": "task-001", "agent_type": "architect", "task_class": "AuthoritySpec" },
"model_decision": {
"primary_model": "deepseek-ai/deepseek-v3.2-maas",
"capability_class": "StrongGeneral",
"budget_mode": "normal",
"routing_reason": "policy_match"
},
"invocations": [
{
"provider": "Deepseek",
"model_id": "deepseek-ai/deepseek-v3.2-maas",
"input_tokens": 4200,
"output_tokens": 1800,
"cost_usd": 0.0031,
"latency_ms": 2340,
"prompt_hash": "sha256:a1b2c3...",
"outcome": "success"
}
],
"artifacts": [
{
"memory_key": "proj:abc:api_contract",
"state": "verified",
"content_hash": "sha256:d4e5f6...",
"verification": { "verifier_type": "schema_validator", "status": "passed" }
}
],
"events": [
{ "kind": "attempt_started", "at": "2026-03-28T10:00:00Z" },
{ "kind": "model_decided", "at": "2026-03-28T10:00:01Z" },
{ "kind": "artifact_validated", "at": "2026-03-28T10:00:05Z" }
],
"total_cost_usd": 0.0031,
"total_duration_ms": 5200
}
Next Steps
- Tracing Artifacts to Prompts — Step-by-step forensic debugging tutorial
- Tasks & Attempts — The lifecycle model
- Branching & Isolation — How lineage works across branches