Skip to main content

Lineage & Explainability

Every artifact in HatiData V2 has a complete provenance chain — from the prompt that created it, through the model that generated it, to the verification that approved it. No orphan artifacts are permitted.

The Lineage Chain

Task
└── Attempt
├── ModelDecision (which model was selected and why)
│ └── LlmInvocation (the actual API call: tokens, cost, latency)
├── ArtifactInstance (what was produced)
│ └── ArtifactValidation (schema + contract verification)
└── WorkflowEvent (audit trail of state transitions)

Every entity links back to its parent via foreign keys. Given any artifact, you can answer:

  • What prompt created it? Follow artifact → attempt → llm_invocation → prompt_hash
  • Why this model? Follow attempt → model_decision → routing_policy + budget_mode
  • Who approved it? Follow artifact → validation → verifier_type + evidence
  • What did it cost? Sum llm_invocations.cost_usd for the attempt chain

Model Decisions

When an agent runs, the LLM router makes a model decision — which model to use, from which provider, at what capability class. This decision is recorded before the LLM call, not after.

FieldDescription
task_classWhat the task needs (e.g., Codegen, AuthoritySpec)
primary_modelModel selected by the routing policy
fallback_chainOrdered list of fallback models
capability_classRequired capability level (e.g., StrongGeneral)
budget_modeBudget state at decision time (normal, warning, constrained)
routing_reasonWhy this model was chosen (policy match, budget downgrade, failover)
Decisions Are Immutable

Model decisions are append-only. If a retry selects a different model (e.g., due to budget downgrade), a new ModelDecision is created — the original is never modified.

LLM Invocations

Each model decision leads to one or more LLM invocations — the actual HTTP calls to the provider.

FieldDescription
providerWhich provider was called (Claude, Gemini, DeepSeek, Qwen)
model_idExact model ID (e.g., deepseek-ai/deepseek-v3.2-maas)
input_tokensTokens sent to the model
output_tokensTokens received from the model
cached_tokensTokens served from prompt cache (90% discount)
cost_usdComputed cost for this invocation
latency_msTime from request to response
prompt_hashSHA-256 hash of the full prompt (for reproducibility)
outcomesuccess, fallback, error

Artifact Lineage

Every artifact instance tracks its complete lineage:

SELECT
a.memory_key,
a.artifact_kind,
a.state,
a.produced_by_agent,
a.produced_in_run_id,
a.content_hash,
v.verifier_type,
v.status AS verification_status
FROM hd_runtime.artifact_instances a
LEFT JOIN hd_runtime.artifact_validations v
ON v.artifact_id = a.id
WHERE a.project_id = :project_id
ORDER BY a.created_at;

Artifact States

StateMeaningConsumable?
declaredAgent contract says this will be producedNo
generatedAgent wrote output to memoryNo
schema_validTyped validator confirmed schemaYes
contract_validAll contract constraints satisfiedYes
verifiedDownstream verifier approvedYes
pinnedGate policy locked as authoritative versionYes
rejectedValidation or verifier failedNo
supersededNewer version acceptedNo

The "No Orphan" Guarantee

HatiData V2 enforces referential integrity on artifacts:

  1. Every ArtifactInstance must reference a valid TaskAttempt
  2. Every TaskAttempt must reference a valid Task
  3. Every Task must reference a valid Project

If any link in this chain is broken, the lineage query returns an explicit error rather than silently returning partial data.

Workflow Events

Every state transition is recorded as an immutable workflow event:

SELECT event_kind, entity_type, entity_id, payload, created_at
FROM hd_runtime.workflow_events
WHERE project_id = :project_id
ORDER BY sequence_num;

Events include:

  • task_dispatched — Task moved to queue
  • attempt_started — Agent claimed the task
  • model_decided — LLM routing decision made
  • artifact_validated — Schema validation passed
  • gate_evaluated — Gate policy checked
  • recovery_initiated — Failure escalated to next recovery level
Workflow Events Are Append-Only

The workflow_events table has REVOKE UPDATE, DELETE — events can never be modified or deleted. This provides a tamper-proof audit trail for compliance (SOC 2, GDPR Article 30).

The ExplainBundle

The /v2/explain/:attempt_id endpoint returns a complete ExplainBundle — a single JSON document containing the full lineage for an attempt:

{
"attempt_id": "att-001",
"task": { "id": "task-001", "agent_type": "architect", "task_class": "AuthoritySpec" },
"model_decision": {
"primary_model": "deepseek-ai/deepseek-v3.2-maas",
"capability_class": "StrongGeneral",
"budget_mode": "normal",
"routing_reason": "policy_match"
},
"invocations": [
{
"provider": "Deepseek",
"model_id": "deepseek-ai/deepseek-v3.2-maas",
"input_tokens": 4200,
"output_tokens": 1800,
"cost_usd": 0.0031,
"latency_ms": 2340,
"prompt_hash": "sha256:a1b2c3...",
"outcome": "success"
}
],
"artifacts": [
{
"memory_key": "proj:abc:api_contract",
"state": "verified",
"content_hash": "sha256:d4e5f6...",
"verification": { "verifier_type": "schema_validator", "status": "passed" }
}
],
"events": [
{ "kind": "attempt_started", "at": "2026-03-28T10:00:00Z" },
{ "kind": "model_decided", "at": "2026-03-28T10:00:01Z" },
{ "kind": "artifact_validated", "at": "2026-03-28T10:00:05Z" }
],
"total_cost_usd": 0.0031,
"total_duration_ms": 5200
}

Next Steps

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.