Tasks & Attempts

HatiData V2 introduces a strict separation between intent (what an agent should do) and execution (what actually happened). This is the foundation of the Governed Runtime.

The Core Distinction

Concept	What It Represents	Lifecycle
Task	The goal: "Generate architecture for project X"	Created once, status derived from attempts
Attempt	A single execution run of that task	Created per retry, tracks state transitions via leases

A Task may have multiple Attempts — each retry creates a new Attempt (not a new Task). This separation enables:

Forensic debugging: compare Attempt #1 (failed) vs Attempt #2 (succeeded)
Cost attribution: each Attempt tracks its own model decisions and token usage
Recovery lineage: trace the chain from initial failure through repair to resolution
Idempotent dispatch: the same logical work item always maps to the same Task

Task Lifecycle

                 +-----------+
                 |  Pending  |  ← Created / retry-eligible after failed attempt
                 +-----+-----+
                       |
                 +-----v-----+
                 |   Active   |  ← Agent claimed an attempt
                 +-----+-----+
                       |
              +--------+--------+
              |                 |
        +-----v-----+    +-----v-----+
        | Completed  |    |   Failed   |  ← All retries exhausted
        +-----------+    +-----------+
                               |
                         +-----v------+
                         | Cancelled   |  ← L4 recovery / operator override
                         +------------+

Status transitions:

pending → active: an agent creates and claims an attempt
active → pending: attempt fails but retry_count < max_retries (task re-queued for next agent)
active → completed: attempt succeeds
active → failed: attempt fails and retries exhausted (retry_count >= max_retries)
any non-terminal → cancelled: operator calls POST /v2/runtime/tasks/:id/cancel

Idempotent Dispatch via `work_item_key`

Every Task can carry a work_item_key — a stable identity for the logical work item. This prevents duplicate dispatch when an orchestrator polls the same phase multiple times.

{
  "kind": "frontend_engineer",
  "project_id": "proj-abc",
  "work_item_key": "run-123:frontend_engineer:default:main"
}

Enforcement: A partial unique index on (org_id, work_item_key) ensures only one non-terminal task exists per key. If you POST /v2/runtime/tasks with a key that already has a pending/active task, the existing task is returned instead of creating a duplicate.

Format convention: {phase_run_id}:{agent_type}:{scope}:{branch}

Retries use Attempts, not new Tasks

When a task fails and has retries remaining, the task goes back to pending — a new Attempt is created on the same Task. This keeps completion, recovery, and lineage clean.

Attempt State Machine

Attempts have 7 states:

  Pending → Running → Succeeded
                   → Failed
                   → TimedOut (lease expired)
           Blocked (waiting for dependency)
           Cancelled (operator override)

State	Terminal?	Allows New Attempt?
`pending`	No	—
`running`	No	—
`blocked`	No	—
`succeeded`	Yes	No
`failed`	Yes	Yes (if retries remain)
`timed_out`	Yes	Yes (if retries remain)
`cancelled`	Yes	No (requires new Task)

Leases and Heartbeats

Every Attempt has a lease — a time-bounded claim on the work. If the agent crashes, the lease expires and the Task becomes available for retry.

Agent claims attempt → lease_expires_at = now + TTL
Agent heartbeats    → lease_expires_at = now + TTL  (extends)
Agent disappears    → lease expires → attempt = timed_out → task = pending

Heartbeat interval: call PUT /v2/runtime/attempts/:id/heartbeat every 60–120 seconds.

Lease expiry worker: a background process runs every 30 seconds, scanning for running attempts where lease_expires_at < now(). Expired attempts transition to timed_out, and if the task has retries remaining, it goes back to pending.

Agent Self-Registration

When an agent creates an attempt with agent_id, it is automatically registered in the agent fleet:

trust_level: provisional (never auto-upgraded from V2 REST path)
registration_source: v2_create_attempt
Best-effort: registration failure does not fail the attempt

This makes V2-only agents (that never use the wire protocol) visible in the Agent Fleet dashboard.

Recovery Levels

Level	Trigger	Action
L1 Auto-retry	Attempt fails, retries remain	Task → `pending`, new attempt created
L2 Backoff	Transient failure pattern	Orchestrator adds delay before retry
L3 Repair	Attempt fails, repair possible	Orchestrator dispatches RepairAgent
L4 Escalate	Retries exhausted, no repair	`POST /v2/runtime/tasks/:id/cancel` + human review

API Endpoints

Method	Endpoint	Description
`POST`	`/v2/runtime/tasks`	Create task (idempotent via `work_item_key`)
`GET`	`/v2/runtime/tasks?project_id=X&status=Y`	List tasks with filters
`GET`	`/v2/runtime/tasks/:id`	Get task with latest attempt
`GET`	`/v2/runtime/tasks/:id/detail`	Aggregate: task + all attempts + decisions + invocations + artifacts + events
`POST`	`/v2/runtime/tasks/:id/cancel`	Cancel task and all active attempts
`POST`	`/v2/runtime/tasks/:id/attempts`	Create attempt (auto-registers agent)
`PUT`	`/v2/runtime/attempts/:id/claim`	Claim with lease token
`PUT`	`/v2/runtime/attempts/:id/heartbeat`	Extend lease
`PUT`	`/v2/runtime/attempts/:id/complete`	Complete with outcome (succeeded/failed)

See V2 Runtime API Reference for full request/response documentation.

Schema

All V2 runtime data lives in the hd_runtime PostgreSQL schema:

Table	Purpose
`tasks`	Top-level work items with `work_item_key` uniqueness
`task_attempts`	Execution runs with lease, status, retry count
`model_decisions`	Which model was chosen and why
`llm_invocations`	Token counts, cost, latency per LLM call
`artifact_instances`	Agent outputs with confidence and validation
`artifact_validations`	Schema, policy, or human validation records
`workflow_events`	Append-only event stream (insert-only, no updates)
`lineage_edges`	Causal dependency DAG
`review_requests`	Human gate with evidence bundle
`release_decisions`	Immutable approval/block records
`recovery_actions`	Automated retry or escalation records

Migrations: 0033_hd_runtime_core.sql through 0042_hd_runtime_work_item_key.sql (all additive).

Next Steps

Lineage & Explainability — trace any artifact to its prompt
Branching & Isolation — speculative execution with 5 visibility modes
V2 Runtime API Reference — full endpoint documentation

The Core Distinction​

Task Lifecycle​

Idempotent Dispatch via work_item_key​

Attempt State Machine​

Leases and Heartbeats​

Agent Self-Registration​

Recovery Levels​

API Endpoints​

Schema​

Next Steps​

Stay in the loop