Audit Guarantees
HatiData maintains two independent, append-only audit trails: a query audit log for every query executed through the proxy, and an IAM audit log for all administrative actions (policy changes, key rotations, user management). Both trails are cryptographic hash-chained, making any tampering or deletion detectable by recomputing the chain.
Hash-Chain Architecture
Every audit entry contains the cryptographic hash of the previous entry in the same session or log stream:
Entry N-1 → hash(Entry N-1) → stored as Entry N's prev_hash
Entry N → hash(Entry N) → stored as Entry N+1's prev_hash
This creates a cryptographic chain: if any entry is modified or deleted, all subsequent prev_hash values become invalid. The chain can be verified at any time via the audit API or by re-hashing the log entries from your S3 bucket.
Tamper Detection
Verify chain integrity for a session:
curl https://api.hatidata.com/v1/audit/sessions/{session_id}/verify \
-H "Authorization: Bearer <jwt>"
{
"session_id": "sess_a1b2c3",
"entry_count": 847,
"chain_valid": true,
"first_entry_hash": "3a4f9c...",
"last_entry_hash": "d82e1b...",
"verified_at": "2026-02-25T09:14:00Z"
}
If chain_valid is false, the response includes the index and hash of the first broken link, enabling forensic investigation.
Query Audit Log
Every query executed through the HatiData proxy generates a query audit entry. Entries are written after execution completes, ensuring PII redaction runs before the entry is committed to storage.
Audit Entry Schema
{
"entry_id": "aud_q1r2s3t4",
"prev_hash": "3a4f9c8d...",
"entry_hash": "d82e1b5f...",
"timestamp": "2026-02-25T09:14:32.018Z",
"query_id": "qry_a1b2c3d4",
"session_id": "sess_a1b2c3",
"user_id": "usr_x9y8z7",
"agent_id": "agent-data-analyst-v2",
"agent_framework": "langchain",
"org_id": "org_abc123",
"environment": "production",
"sql": "SELECT name, email FROM customers WHERE created_at > '2026-01-01'",
"sql_redacted": "SELECT name, email FROM customers WHERE created_at > '2026-01-01'",
"tables_accessed": ["customers"],
"rows_returned": 1842,
"columns_masked": ["email"],
"execution_time_ms": 34,
"cache_hit": false,
"cost_credits": 0.12,
"policy_verdicts": [
{ "policy": "pii-masking", "action": "mask", "columns": ["email"] }
],
"rls_filters_applied": ["department = 'engineering'"],
"source_ip": "10.0.1.50",
"query_origin": "agent"
}
PII Redaction Before Storage
Audit entries undergo automatic PII redaction before being written to S3. The raw SQL text is scanned for the following patterns using compiled regular expressions:
| Pattern | Example | Redacted |
|---|---|---|
| Email address | alice@example.com | [EMAIL_REDACTED] |
| US Social Security Number | 123-45-6789 | [SSN_REDACTED] |
| Payment card number | 4111111111111111 | [CC_REDACTED] |
| Phone number | +1-555-0100 | [PHONE_REDACTED] |
Redaction is applied to the sql_redacted field. The original SQL is never stored. This ensures that even if audit storage were compromised, PII in query text would not be exposed.
IAM Audit Log
All administrative actions are recorded in a separate IAM audit trail. This trail captures policy CRUD, key rotation, user management, and SSO configuration changes with before/after values:
{
"entry_id": "aud_i1a2m3",
"prev_hash": "c91f3a...",
"entry_hash": "7b44de...",
"timestamp": "2026-02-25T08:00:12.004Z",
"action": "policy.update",
"actor_id": "usr_admin01",
"actor_role": "admin",
"org_id": "org_abc123",
"resource_type": "policy",
"resource_id": "pol_pii_masking",
"before": { "enabled": false },
"after": { "enabled": true },
"source_ip": "10.0.0.5"
}
IAM audit entries use the same hash-chain mechanism as query audit entries, but the chain is independent (keyed by org_id).
Querying Audit Logs via SQL
Audit logs are stored in your S3 bucket as JSONL files partitioned by date. They can be queried directly through the HatiData proxy using the _hatidata_audit system schema:
-- Count queries by agent in the last 7 days
SELECT
agent_id,
agent_framework,
COUNT(*) AS query_count,
SUM(cost_credits) AS total_credits,
AVG(execution_time_ms) AS avg_latency_ms
FROM _hatidata_audit.queries
WHERE timestamp >= NOW() - INTERVAL '7 days'
GROUP BY agent_id, agent_framework
ORDER BY query_count DESC;
-- Find all policy changes in the last 30 days
SELECT
timestamp,
actor_id,
action,
resource_id,
before,
after
FROM _hatidata_audit.iam_events
WHERE timestamp >= NOW() - INTERVAL '30 days'
AND action LIKE 'policy.%'
ORDER BY timestamp DESC;
-- Verify any queries that accessed the customers table without masking
SELECT
query_id,
user_id,
agent_id,
sql_redacted,
columns_masked
FROM _hatidata_audit.queries
WHERE 'customers' = ANY(tables_accessed)
AND NOT ('email' = ANY(columns_masked))
AND timestamp >= NOW() - INTERVAL '90 days';
Retention Policies
| Tier | Hot Storage | Warm Storage | Cold Archive | Total Retention |
|---|---|---|---|---|
| Cloud | 90 days (S3 Standard) | -- | -- | 90 days |
| Growth | 90 days (S3 Standard) | 1 year (S3 Glacier) | -- | 1 year |
| Enterprise | 90 days (S3 Standard) | 1 year (S3 Glacier) | 7 years (S3 Deep Archive) | 7 years |
S3 Object Lock is enabled in Governance mode for all tiers. Within the retention period, no user — including the bucket owner — can delete or overwrite audit entries without explicitly removing the Object Lock hold, which itself generates an audited event.
Custom retention periods are configurable for Enterprise customers. A minimum of 7 years is recommended for HIPAA and SOC 2 Type II compliance.
Export Formats
Audit logs stored in S3 are compatible with common SIEM and log management platforms:
| Format | Details |
|---|---|
| JSONL | Default. One JSON object per line, partitioned by year/month/day/. |
| Parquet | Available for large-scale analytics. Generated nightly via a scheduled export job. |
| CSV | Available via the control plane export API for date ranges up to 90 days. |
SIEM Integrations
| SIEM | Integration Method |
|---|---|
| Splunk | S3 input add-on, reads JSONL from partitioned prefix |
| Datadog | S3 log forwarding, auto-parsed JSON fields |
| Elastic / OpenSearch | S3 input via Logstash or Beats S3 input plugin |
| Sumo Logic | S3 source, automatic field extraction from JSONL |
| Microsoft Sentinel | Azure Blob / S3 connector |
Export via API
# Export query audit for a specific date range (returns JSONL)
curl "https://api.hatidata.com/v1/audit/export?start=2026-02-01&end=2026-02-25&type=queries" \
-H "Authorization: Bearer <jwt>" \
--output audit-feb-2026.jsonl
Immutability Guarantees
| Property | Mechanism |
|---|---|
| Append-only writes | S3 Object Lock (Governance mode). No PUT-overwrite or DELETE within retention period. |
| Tamper detection | Cryptographic hash chaining. Any modification breaks the chain and is detectable via the verify API. |
| PII protection | Redaction applied before write. Raw SQL containing PII is never stored. |
| No HatiData access | Audit logs are written to the customer's S3 bucket. HatiData infrastructure has no read or write access to the bucket. |
Related Concepts
- SOC 2 Architecture — How audit controls map to Trust Service Criteria
- Data Residency — Where audit logs are stored geographically
- CMEK & Encryption — Encryption applied to audit log storage
- Security Model — PII redaction in query results and audit entries
- Control Plane API — Programmatic access to audit entries and chain verification