Tasks & Attempts
HatiData V2 introduces a strict separation between intent (what an agent should do) and execution (what actually happened). This is the foundation of the Governed Runtime.
The Core Distinction
| Concept | What It Represents | Lifecycle |
|---|---|---|
| Task | The goal: "Generate architecture for project X" | Created once, status derived from attempts |
| Attempt | A single execution run of that task | Created per retry, tracks state transitions via leases |
A Task may have multiple Attempts — each retry creates a new Attempt (not a new Task). This separation enables:
- Forensic debugging: compare Attempt #1 (failed) vs Attempt #2 (succeeded)
- Cost attribution: each Attempt tracks its own model decisions and token usage
- Recovery lineage: trace the chain from initial failure through repair to resolution
- Idempotent dispatch: the same logical work item always maps to the same Task
Task Lifecycle
+-----------+
| Pending | ← Created / retry-eligible after failed attempt
+-----+-----+
|
+-----v-----+
| Active | ← Agent claimed an attempt
+-----+-----+
|
+--------+--------+
| |
+-----v-----+ +-----v-----+
| Completed | | Failed | ← All retries exhausted
+-----------+ +-----------+
|
+-----v------+
| Cancelled | ← L4 recovery / operator override
+------------+
Status transitions:
pending→active: an agent creates and claims an attemptactive→pending: attempt fails butretry_count < max_retries(task re-queued for next agent)active→completed: attempt succeedsactive→failed: attempt fails and retries exhausted (retry_count >= max_retries)- any non-terminal →
cancelled: operator callsPOST /v2/runtime/tasks/:id/cancel
Idempotent Dispatch via work_item_key
Every Task can carry a work_item_key — a stable identity for the logical work item. This prevents duplicate dispatch when an orchestrator polls the same phase multiple times.
{
"kind": "frontend_engineer",
"project_id": "proj-abc",
"work_item_key": "run-123:frontend_engineer:default:main"
}
Enforcement: A partial unique index on (org_id, work_item_key) ensures only one non-terminal task exists per key. If you POST /v2/runtime/tasks with a key that already has a pending/active task, the existing task is returned instead of creating a duplicate.
Format convention: {phase_run_id}:{agent_type}:{scope}:{branch}
When a task fails and has retries remaining, the task goes back to pending — a new Attempt is created on the same Task. This keeps completion, recovery, and lineage clean.
Attempt State Machine
Attempts have 7 states:
Pending → Running → Succeeded
→ Failed
→ TimedOut (lease expired)
Blocked (waiting for dependency)
Cancelled (operator override)
| State | Terminal? | Allows New Attempt? |
|---|---|---|
pending | No | — |
running | No | — |
blocked | No | — |
succeeded | Yes | No |
failed | Yes | Yes (if retries remain) |
timed_out | Yes | Yes (if retries remain) |
cancelled | Yes | No (requires new Task) |
Leases and Heartbeats
Every Attempt has a lease — a time-bounded claim on the work. If the agent crashes, the lease expires and the Task becomes available for retry.
Agent claims attempt → lease_expires_at = now + TTL
Agent heartbeats → lease_expires_at = now + TTL (extends)
Agent disappears → lease expires → attempt = timed_out → task = pending
Heartbeat interval: call PUT /v2/runtime/attempts/:id/heartbeat every 60–120 seconds.
Lease expiry worker: a background process runs every 30 seconds, scanning for running attempts where lease_expires_at < now(). Expired attempts transition to timed_out, and if the task has retries remaining, it goes back to pending.
Agent Self-Registration
When an agent creates an attempt with agent_id, it is automatically registered in the agent fleet:
trust_level:provisional(never auto-upgraded from V2 REST path)registration_source:v2_create_attempt- Best-effort: registration failure does not fail the attempt
This makes V2-only agents (that never use the wire protocol) visible in the Agent Fleet dashboard.
Recovery Levels
| Level | Trigger | Action |
|---|---|---|
| L1 Auto-retry | Attempt fails, retries remain | Task → pending, new attempt created |
| L2 Backoff | Transient failure pattern | Orchestrator adds delay before retry |
| L3 Repair | Attempt fails, repair possible | Orchestrator dispatches RepairAgent |
| L4 Escalate | Retries exhausted, no repair | POST /v2/runtime/tasks/:id/cancel + human review |
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST | /v2/runtime/tasks | Create task (idempotent via work_item_key) |
GET | /v2/runtime/tasks?project_id=X&status=Y | List tasks with filters |
GET | /v2/runtime/tasks/:id | Get task with latest attempt |
GET | /v2/runtime/tasks/:id/detail | Aggregate: task + all attempts + decisions + invocations + artifacts + events |
POST | /v2/runtime/tasks/:id/cancel | Cancel task and all active attempts |
POST | /v2/runtime/tasks/:id/attempts | Create attempt (auto-registers agent) |
PUT | /v2/runtime/attempts/:id/claim | Claim with lease token |
PUT | /v2/runtime/attempts/:id/heartbeat | Extend lease |
PUT | /v2/runtime/attempts/:id/complete | Complete with outcome (succeeded/failed) |
See V2 Runtime API Reference for full request/response documentation.
Schema
All V2 runtime data lives in the hd_runtime PostgreSQL schema:
| Table | Purpose |
|---|---|
tasks | Top-level work items with work_item_key uniqueness |
task_attempts | Execution runs with lease, status, retry count |
model_decisions | Which model was chosen and why |
llm_invocations | Token counts, cost, latency per LLM call |
artifact_instances | Agent outputs with confidence and validation |
artifact_validations | Schema, policy, or human validation records |
workflow_events | Append-only event stream (insert-only, no updates) |
lineage_edges | Causal dependency DAG |
review_requests | Human gate with evidence bundle |
release_decisions | Immutable approval/block records |
recovery_actions | Automated retry or escalation records |
Migrations: 0033_hd_runtime_core.sql through 0042_hd_runtime_work_item_key.sql (all additive).
Next Steps
- Lineage & Explainability — trace any artifact to its prompt
- Branching & Isolation — speculative execution with 5 visibility modes
- V2 Runtime API Reference — full endpoint documentation