Skip to main content

Tasks & Attempts

HatiData V2 introduces a strict separation between intent (what an agent should do) and execution (what actually happened). This is the foundation of the Governed Runtime.

The Core Distinction

ConceptWhat It RepresentsLifecycle
TaskThe goal: "Generate architecture for project X"Created once, status derived from attempts
AttemptA single execution run of that taskCreated per retry, tracks state transitions via leases

A Task may have multiple Attempts — each retry creates a new Attempt (not a new Task). This separation enables:

  • Forensic debugging: compare Attempt #1 (failed) vs Attempt #2 (succeeded)
  • Cost attribution: each Attempt tracks its own model decisions and token usage
  • Recovery lineage: trace the chain from initial failure through repair to resolution
  • Idempotent dispatch: the same logical work item always maps to the same Task

Task Lifecycle

                 +-----------+
| Pending | ← Created / retry-eligible after failed attempt
+-----+-----+
|
+-----v-----+
| Active | ← Agent claimed an attempt
+-----+-----+
|
+--------+--------+
| |
+-----v-----+ +-----v-----+
| Completed | | Failed | ← All retries exhausted
+-----------+ +-----------+
|
+-----v------+
| Cancelled | ← L4 recovery / operator override
+------------+

Status transitions:

  • pendingactive: an agent creates and claims an attempt
  • activepending: attempt fails but retry_count < max_retries (task re-queued for next agent)
  • activecompleted: attempt succeeds
  • activefailed: attempt fails and retries exhausted (retry_count >= max_retries)
  • any non-terminal → cancelled: operator calls POST /v2/runtime/tasks/:id/cancel

Idempotent Dispatch via work_item_key

Every Task can carry a work_item_key — a stable identity for the logical work item. This prevents duplicate dispatch when an orchestrator polls the same phase multiple times.

{
"kind": "frontend_engineer",
"project_id": "proj-abc",
"work_item_key": "run-123:frontend_engineer:default:main"
}

Enforcement: A partial unique index on (org_id, work_item_key) ensures only one non-terminal task exists per key. If you POST /v2/runtime/tasks with a key that already has a pending/active task, the existing task is returned instead of creating a duplicate.

Format convention: {phase_run_id}:{agent_type}:{scope}:{branch}

Retries use Attempts, not new Tasks

When a task fails and has retries remaining, the task goes back to pending — a new Attempt is created on the same Task. This keeps completion, recovery, and lineage clean.

Attempt State Machine

Attempts have 7 states:

  Pending → Running → Succeeded
→ Failed
→ TimedOut (lease expired)
Blocked (waiting for dependency)
Cancelled (operator override)
StateTerminal?Allows New Attempt?
pendingNo
runningNo
blockedNo
succeededYesNo
failedYesYes (if retries remain)
timed_outYesYes (if retries remain)
cancelledYesNo (requires new Task)

Leases and Heartbeats

Every Attempt has a lease — a time-bounded claim on the work. If the agent crashes, the lease expires and the Task becomes available for retry.

Agent claims attempt → lease_expires_at = now + TTL
Agent heartbeats → lease_expires_at = now + TTL (extends)
Agent disappears → lease expires → attempt = timed_out → task = pending

Heartbeat interval: call PUT /v2/runtime/attempts/:id/heartbeat every 60–120 seconds.

Lease expiry worker: a background process runs every 30 seconds, scanning for running attempts where lease_expires_at < now(). Expired attempts transition to timed_out, and if the task has retries remaining, it goes back to pending.

Agent Self-Registration

When an agent creates an attempt with agent_id, it is automatically registered in the agent fleet:

  • trust_level: provisional (never auto-upgraded from V2 REST path)
  • registration_source: v2_create_attempt
  • Best-effort: registration failure does not fail the attempt

This makes V2-only agents (that never use the wire protocol) visible in the Agent Fleet dashboard.

Recovery Levels

LevelTriggerAction
L1 Auto-retryAttempt fails, retries remainTask → pending, new attempt created
L2 BackoffTransient failure patternOrchestrator adds delay before retry
L3 RepairAttempt fails, repair possibleOrchestrator dispatches RepairAgent
L4 EscalateRetries exhausted, no repairPOST /v2/runtime/tasks/:id/cancel + human review

API Endpoints

MethodEndpointDescription
POST/v2/runtime/tasksCreate task (idempotent via work_item_key)
GET/v2/runtime/tasks?project_id=X&status=YList tasks with filters
GET/v2/runtime/tasks/:idGet task with latest attempt
GET/v2/runtime/tasks/:id/detailAggregate: task + all attempts + decisions + invocations + artifacts + events
POST/v2/runtime/tasks/:id/cancelCancel task and all active attempts
POST/v2/runtime/tasks/:id/attemptsCreate attempt (auto-registers agent)
PUT/v2/runtime/attempts/:id/claimClaim with lease token
PUT/v2/runtime/attempts/:id/heartbeatExtend lease
PUT/v2/runtime/attempts/:id/completeComplete with outcome (succeeded/failed)

See V2 Runtime API Reference for full request/response documentation.

Schema

All V2 runtime data lives in the hd_runtime PostgreSQL schema:

TablePurpose
tasksTop-level work items with work_item_key uniqueness
task_attemptsExecution runs with lease, status, retry count
model_decisionsWhich model was chosen and why
llm_invocationsToken counts, cost, latency per LLM call
artifact_instancesAgent outputs with confidence and validation
artifact_validationsSchema, policy, or human validation records
workflow_eventsAppend-only event stream (insert-only, no updates)
lineage_edgesCausal dependency DAG
review_requestsHuman gate with evidence bundle
release_decisionsImmutable approval/block records
recovery_actionsAutomated retry or escalation records

Migrations: 0033_hd_runtime_core.sql through 0042_hd_runtime_work_item_key.sql (all additive).

Next Steps

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.