Cost Optimization

Recipes for reducing compute, storage, and credit consumption across cache tuning, query design, branch lifecycle management, and memory eviction.

Recipe 1: Cache Tuning

HatiData's two-tier cache (L1 in-process, L2 on-disk) intercepts repeated queries before they reach the execution pipeline. Tuning cache size and TTL has the largest impact on credit usage for read-heavy workloads.

Problem: Agents issuing the same aggregation queries repeatedly exhaust credits on redundant computation.

Solution: Size L1 and L2 appropriately for your query pattern, and set TTLs that match data freshness requirements.

# proxy config (proxy.toml or environment variables)
[cache]
l1_max_entries    = 500          # In-process LRU entries (default: 200)
l1_ttl_seconds    = 60           # L1 expiry — short, for hot queries
l2_max_size_mb    = 2048         # On-disk cache (default: 512 MB)
l2_ttl_seconds    = 3600         # L2 expiry — 1 hour for slower-changing data
l2_eviction       = "lru"        # lru | lfu | ttl-only

Cache tier	Latency	Recommended size	Best for
L1 (memory)	<1 ms	200–1,000 entries	Identical repeated queries within a session
L2 (disk)	5–50 ms	512 MB–4 GB	Cross-session hot queries, aggregations

Environment variable equivalents:

HATIDATA_CACHE_L1_MAX_ENTRIES=500
HATIDATA_CACHE_L1_TTL_SECONDS=60
HATIDATA_CACHE_L2_MAX_SIZE_MB=2048
HATIDATA_CACHE_L2_TTL_SECONDS=3600

Recipe 2: Query Optimization

Poorly written queries are the single largest source of avoidable credit spend. Apply these patterns before tuning infrastructure.

Problem: Agents running SELECT * on wide tables scan and return more columns than needed, increasing both cost and latency.

Solution: Always project only the columns the agent will consume.

-- Before (scans all columns, expensive on wide tables)
SELECT * FROM customer_events WHERE event_date >= '2026-01-01';

-- After (scans only required columns)
SELECT customer_id, event_type, event_date
FROM customer_events
WHERE event_date >= '2026-01-01';

Problem: Agents querying without row filters scan the full table.

Solution: Always include a predicate on the partition or primary key column.

-- Before (full table scan)
SELECT customer_id, revenue FROM orders;

-- After (partition pruned, ~95% fewer rows scanned)
SELECT customer_id, revenue
FROM orders
WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31'
  AND status = 'completed';

Problem: Agents fetching unbounded result sets for downstream filtering.

Solution: Push filtering into SQL and use LIMIT to cap result size.

-- Before: agent fetches 100 000 rows and filters in Python
rows = client.query("SELECT * FROM events").rows
top10 = sorted(rows, key=lambda r: r["score"], reverse=True)[:10]

-- After: database does the work
rows = client.query("""
    SELECT event_id, score, occurred_at
    FROM events
    WHERE score > 0.8
    ORDER BY score DESC
    LIMIT 10
""").rows

Recipe 3: Branch Cleanup

Abandoned branches accumulate storage and count against per-org quotas. Set aggressive TTLs and clean up after merges.

Problem: Experimental branches created by agents are never discarded, inflating storage costs over time.

Solution: Configure a short default TTL and explicitly discard branches after merge or rejection.

# Control plane config
[branches]
default_ttl_hours       = 24    # Auto-discard after 24 hours (default: 72)
max_branches_per_org    = 50    # Hard cap — new creates fail above this limit
gc_interval_minutes     = 30    # Garbage collector run interval

import hatidata

client = hatidata.Client(api_key="hd_...")

# List branches older than 12 hours that are not merged
stale = client.branches.list(
    status="active",
    created_before_hours=12,
)

for branch in stale:
    print(f"Discarding stale branch: {branch.label} ({branch.branch_id})")
    client.branches.discard(branch_id=branch.branch_id)

Configuration reference:

Parameter	Default	Recommendation
`default_ttl_hours`	72	24 for ephemeral agents; 168 for long-running workflows
`max_branches_per_org`	100	Set lower in cost-sensitive orgs (e.g., 25–50)
`gc_interval_minutes`	60	30 min for high-branch-volume orgs

Recipe 4: Memory Cleanup

Long-running agents accumulate low-value memories. Regular eviction keeps storage lean and search quality high.

Problem: Memories with low importance and no recent access inflate vector index size, degrading search performance and increasing storage cost.

Solution: Run a scheduled eviction job using importance and access-recency thresholds.

# Option A: SDK cleanup helper (recommended for most use cases)
client.memory.cleanup(
    agent_id="research-agent",
    max_age_days=90,        # Evict memories not accessed in 90 days
    min_importance=0.3,     # Evict memories below this importance score
)

# Option B: Direct SQL for fine-grained control
EVICTION_SQL = """
DELETE FROM _hatidata_memory
WHERE
    agent_id = 'research-agent'
    AND CAST(metadata->>'importance' AS FLOAT) < 0.3
    AND CAST(metadata->>'last_accessed' AS TIMESTAMP)
        < NOW() - INTERVAL '90 days';
"""
client.query(EVICTION_SQL)

Recommended thresholds by workload:

Workload type	`min_importance`	`max_age_days`
Fast-moving (news, alerts)	0.5	14
Operational (support, CRM)	0.3	60
Knowledge base (research)	0.2	180
Compliance archive	Do not evict	—

Run cleanup as a background cron task, not inline with agent requests, to avoid adding latency to the hot path.

Recipe 5: Monitoring Credit Usage

Track credit consumption via Prometheus metrics and direct SQL so you can catch runaway agents before they exhaust quotas.

Prometheus metrics exposed by the proxy:

hatidata_credits_used_total{org_id, agent_id}    # Cumulative credits consumed
hatidata_query_duration_seconds{agent_id, p50, p99}
hatidata_cache_hit_ratio{tier="l1|l2"}
hatidata_rows_scanned_total{agent_id, table}

Example Prometheus alert rule:

groups:
  - name: hatidata-cost
    rules:
      - alert: AgentCreditSpike
        expr: |
          rate(hatidata_credits_used_total[5m]) > 10
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Agent {{ $labels.agent_id }} consuming >10 credits/min"

SQL-based credit report:

-- Top 10 credit-consuming agents, last 7 days
SELECT
    agent_id,
    SUM(credits_consumed)   AS total_credits,
    COUNT(*)                AS query_count,
    AVG(credits_consumed)   AS avg_credits_per_query,
    MAX(credits_consumed)   AS max_single_query
FROM _hatidata_usage_log
WHERE executed_at >= NOW() - INTERVAL '7 days'
GROUP BY agent_id
ORDER BY total_credits DESC
LIMIT 10;

Recipe 6: Quota Management

Set per-org credit limits and alert thresholds to prevent unexpected overage.

Problem: A single misbehaving agent can exhaust the entire org's monthly credit allocation.

Solution: Configure per-org quotas and per-agent soft limits.

# Control plane config
[quotas]
monthly_credit_limit    = 10000     # Hard stop at this usage level
alert_threshold_pct     = 80        # Send alert when 80% of limit is reached
per_agent_soft_limit    = 500       # Warn (do not block) individual agents above this

# Query current usage against quota
QUOTA_SQL = """
SELECT
    org_id,
    monthly_credit_limit,
    used_credits,
    ROUND(100.0 * used_credits / monthly_credit_limit, 1) AS pct_used,
    monthly_credit_limit - used_credits                   AS remaining
FROM _hatidata_resource_quotas
WHERE org_id = 'your-org-id';
"""
quota = client.query(QUOTA_SQL).rows[0]
print(f"Credits used: {quota['pct_used']}% ({quota['used_credits']} / {quota['monthly_credit_limit']})")
print(f"Remaining: {quota['remaining']} credits")

Alert webhook for quota threshold:

client.triggers.register(
    name="quota-80pct-alert",
    description="Alert when org reaches 80% of monthly credit quota",
    concept="high credit usage quota threshold budget exceeded",
    threshold=0.80,
    action={
        "type": "webhook",
        "url": "https://ops.yourorg.com/webhooks/quota-alert",
        "hmac_secret": "whsec_...",
    },
    cooldown_seconds=3600,  # Alert at most once per hour
)

Recipe 1: Cache Tuning​

Recipe 2: Query Optimization​

Recipe 3: Branch Cleanup​

Recipe 4: Memory Cleanup​

Recipe 5: Monitoring Credit Usage​

Recipe 6: Quota Management​

Related Concepts​

Stay in the loop