Skip to main content

Cost Optimization

Recipes for reducing compute, storage, and credit consumption across cache tuning, query design, branch lifecycle management, and memory eviction.


Recipe 1: Cache Tuning

HatiData's two-tier cache (L1 in-process, L2 on-disk) intercepts repeated queries before they reach the execution pipeline. Tuning cache size and TTL has the largest impact on credit usage for read-heavy workloads.

Problem: Agents issuing the same aggregation queries repeatedly exhaust credits on redundant computation.

Solution: Size L1 and L2 appropriately for your query pattern, and set TTLs that match data freshness requirements.

# proxy config (proxy.toml or environment variables)
[cache]
l1_max_entries = 500 # In-process LRU entries (default: 200)
l1_ttl_seconds = 60 # L1 expiry — short, for hot queries
l2_max_size_mb = 2048 # On-disk cache (default: 512 MB)
l2_ttl_seconds = 3600 # L2 expiry — 1 hour for slower-changing data
l2_eviction = "lru" # lru | lfu | ttl-only
Cache tierLatencyRecommended sizeBest for
L1 (memory)<1 ms200–1,000 entriesIdentical repeated queries within a session
L2 (disk)5–50 ms512 MB–4 GBCross-session hot queries, aggregations

Environment variable equivalents:

HATIDATA_CACHE_L1_MAX_ENTRIES=500
HATIDATA_CACHE_L1_TTL_SECONDS=60
HATIDATA_CACHE_L2_MAX_SIZE_MB=2048
HATIDATA_CACHE_L2_TTL_SECONDS=3600

Recipe 2: Query Optimization

Poorly written queries are the single largest source of avoidable credit spend. Apply these patterns before tuning infrastructure.

Problem: Agents running SELECT * on wide tables scan and return more columns than needed, increasing both cost and latency.

Solution: Always project only the columns the agent will consume.

-- Before (scans all columns, expensive on wide tables)
SELECT * FROM customer_events WHERE event_date >= '2026-01-01';

-- After (scans only required columns)
SELECT customer_id, event_type, event_date
FROM customer_events
WHERE event_date >= '2026-01-01';

Problem: Agents querying without row filters scan the full table.

Solution: Always include a predicate on the partition or primary key column.

-- Before (full table scan)
SELECT customer_id, revenue FROM orders;

-- After (partition pruned, ~95% fewer rows scanned)
SELECT customer_id, revenue
FROM orders
WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31'
AND status = 'completed';

Problem: Agents fetching unbounded result sets for downstream filtering.

Solution: Push filtering into SQL and use LIMIT to cap result size.

-- Before: agent fetches 100 000 rows and filters in Python
rows = client.query("SELECT * FROM events").rows
top10 = sorted(rows, key=lambda r: r["score"], reverse=True)[:10]

-- After: database does the work
rows = client.query("""
SELECT event_id, score, occurred_at
FROM events
WHERE score > 0.8
ORDER BY score DESC
LIMIT 10
""").rows

Recipe 3: Branch Cleanup

Abandoned branches accumulate storage and count against per-org quotas. Set aggressive TTLs and clean up after merges.

Problem: Experimental branches created by agents are never discarded, inflating storage costs over time.

Solution: Configure a short default TTL and explicitly discard branches after merge or rejection.

# Control plane config
[branches]
default_ttl_hours = 24 # Auto-discard after 24 hours (default: 72)
max_branches_per_org = 50 # Hard cap — new creates fail above this limit
gc_interval_minutes = 30 # Garbage collector run interval
import hatidata

client = hatidata.Client(api_key="hd_...")

# List branches older than 12 hours that are not merged
stale = client.branches.list(
status="active",
created_before_hours=12,
)

for branch in stale:
print(f"Discarding stale branch: {branch.label} ({branch.branch_id})")
client.branches.discard(branch_id=branch.branch_id)

Configuration reference:

ParameterDefaultRecommendation
default_ttl_hours7224 for ephemeral agents; 168 for long-running workflows
max_branches_per_org100Set lower in cost-sensitive orgs (e.g., 25–50)
gc_interval_minutes6030 min for high-branch-volume orgs

Recipe 4: Memory Cleanup

Long-running agents accumulate low-value memories. Regular eviction keeps storage lean and search quality high.

Problem: Memories with low importance and no recent access inflate vector index size, degrading search performance and increasing storage cost.

Solution: Run a scheduled eviction job using importance and access-recency thresholds.

# Option A: SDK cleanup helper (recommended for most use cases)
client.memory.cleanup(
agent_id="research-agent",
max_age_days=90, # Evict memories not accessed in 90 days
min_importance=0.3, # Evict memories below this importance score
)

# Option B: Direct SQL for fine-grained control
EVICTION_SQL = """
DELETE FROM _hatidata_memory
WHERE
agent_id = 'research-agent'
AND CAST(metadata->>'importance' AS FLOAT) < 0.3
AND CAST(metadata->>'last_accessed' AS TIMESTAMP)
< NOW() - INTERVAL '90 days';
"""
client.query(EVICTION_SQL)

Recommended thresholds by workload:

Workload typemin_importancemax_age_days
Fast-moving (news, alerts)0.514
Operational (support, CRM)0.360
Knowledge base (research)0.2180
Compliance archiveDo not evict

Run cleanup as a background cron task, not inline with agent requests, to avoid adding latency to the hot path.


Recipe 5: Monitoring Credit Usage

Track credit consumption via Prometheus metrics and direct SQL so you can catch runaway agents before they exhaust quotas.

Prometheus metrics exposed by the proxy:

hatidata_credits_used_total{org_id, agent_id}    # Cumulative credits consumed
hatidata_query_duration_seconds{agent_id, p50, p99}
hatidata_cache_hit_ratio{tier="l1|l2"}
hatidata_rows_scanned_total{agent_id, table}

Example Prometheus alert rule:

groups:
- name: hatidata-cost
rules:
- alert: AgentCreditSpike
expr: |
rate(hatidata_credits_used_total[5m]) > 10
for: 2m
labels:
severity: warning
annotations:
summary: "Agent {{ $labels.agent_id }} consuming >10 credits/min"

SQL-based credit report:

-- Top 10 credit-consuming agents, last 7 days
SELECT
agent_id,
SUM(credits_consumed) AS total_credits,
COUNT(*) AS query_count,
AVG(credits_consumed) AS avg_credits_per_query,
MAX(credits_consumed) AS max_single_query
FROM _hatidata_usage_log
WHERE executed_at >= NOW() - INTERVAL '7 days'
GROUP BY agent_id
ORDER BY total_credits DESC
LIMIT 10;

Recipe 6: Quota Management

Set per-org credit limits and alert thresholds to prevent unexpected overage.

Problem: A single misbehaving agent can exhaust the entire org's monthly credit allocation.

Solution: Configure per-org quotas and per-agent soft limits.

# Control plane config
[quotas]
monthly_credit_limit = 10000 # Hard stop at this usage level
alert_threshold_pct = 80 # Send alert when 80% of limit is reached
per_agent_soft_limit = 500 # Warn (do not block) individual agents above this
# Query current usage against quota
QUOTA_SQL = """
SELECT
org_id,
monthly_credit_limit,
used_credits,
ROUND(100.0 * used_credits / monthly_credit_limit, 1) AS pct_used,
monthly_credit_limit - used_credits AS remaining
FROM _hatidata_resource_quotas
WHERE org_id = 'your-org-id';
"""
quota = client.query(QUOTA_SQL).rows[0]
print(f"Credits used: {quota['pct_used']}% ({quota['used_credits']} / {quota['monthly_credit_limit']})")
print(f"Remaining: {quota['remaining']} credits")

Alert webhook for quota threshold:

client.triggers.register(
name="quota-80pct-alert",
description="Alert when org reaches 80% of monthly credit quota",
concept="high credit usage quota threshold budget exceeded",
threshold=0.80,
action={
"type": "webhook",
"url": "https://ops.yourorg.com/webhooks/quota-alert",
"hmac_secret": "whsec_...",
},
cooldown_seconds=3600, # Alert at most once per hour
)

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.