Skip to main content

Audit Guarantees

HatiData maintains two independent, append-only audit trails: a query audit log for every query executed through the proxy, and an IAM audit log for all administrative actions (policy changes, key rotations, user management). Both trails are cryptographic hash-chained, making any tampering or deletion detectable by recomputing the chain.


Hash-Chain Architecture

Every audit entry contains the cryptographic hash of the previous entry in the same session or log stream:

Entry N-1 → hash(Entry N-1) → stored as Entry N's prev_hash
Entry N → hash(Entry N) → stored as Entry N+1's prev_hash

This creates a cryptographic chain: if any entry is modified or deleted, all subsequent prev_hash values become invalid. The chain can be verified at any time via the audit API or by re-hashing the log entries from your S3 bucket.

Tamper Detection

Verify chain integrity for a session:

curl https://api.hatidata.com/v1/audit/sessions/{session_id}/verify \
-H "Authorization: Bearer <jwt>"
{
"session_id": "sess_a1b2c3",
"entry_count": 847,
"chain_valid": true,
"first_entry_hash": "3a4f9c...",
"last_entry_hash": "d82e1b...",
"verified_at": "2026-02-25T09:14:00Z"
}

If chain_valid is false, the response includes the index and hash of the first broken link, enabling forensic investigation.


Query Audit Log

Every query executed through the HatiData proxy generates a query audit entry. Entries are written after execution completes, ensuring PII redaction runs before the entry is committed to storage.

Audit Entry Schema

{
"entry_id": "aud_q1r2s3t4",
"prev_hash": "3a4f9c8d...",
"entry_hash": "d82e1b5f...",
"timestamp": "2026-02-25T09:14:32.018Z",
"query_id": "qry_a1b2c3d4",
"session_id": "sess_a1b2c3",
"user_id": "usr_x9y8z7",
"agent_id": "agent-data-analyst-v2",
"agent_framework": "langchain",
"org_id": "org_abc123",
"environment": "production",
"sql": "SELECT name, email FROM customers WHERE created_at > '2026-01-01'",
"sql_redacted": "SELECT name, email FROM customers WHERE created_at > '2026-01-01'",
"tables_accessed": ["customers"],
"rows_returned": 1842,
"columns_masked": ["email"],
"execution_time_ms": 34,
"cache_hit": false,
"cost_credits": 0.12,
"policy_verdicts": [
{ "policy": "pii-masking", "action": "mask", "columns": ["email"] }
],
"rls_filters_applied": ["department = 'engineering'"],
"source_ip": "10.0.1.50",
"query_origin": "agent"
}

PII Redaction Before Storage

Audit entries undergo automatic PII redaction before being written to S3. The raw SQL text is scanned for the following patterns using compiled regular expressions:

PatternExampleRedacted
Email addressalice@example.com[EMAIL_REDACTED]
US Social Security Number123-45-6789[SSN_REDACTED]
Payment card number4111111111111111[CC_REDACTED]
Phone number+1-555-0100[PHONE_REDACTED]

Redaction is applied to the sql_redacted field. The original SQL is never stored. This ensures that even if audit storage were compromised, PII in query text would not be exposed.


IAM Audit Log

All administrative actions are recorded in a separate IAM audit trail. This trail captures policy CRUD, key rotation, user management, and SSO configuration changes with before/after values:

{
"entry_id": "aud_i1a2m3",
"prev_hash": "c91f3a...",
"entry_hash": "7b44de...",
"timestamp": "2026-02-25T08:00:12.004Z",
"action": "policy.update",
"actor_id": "usr_admin01",
"actor_role": "admin",
"org_id": "org_abc123",
"resource_type": "policy",
"resource_id": "pol_pii_masking",
"before": { "enabled": false },
"after": { "enabled": true },
"source_ip": "10.0.0.5"
}

IAM audit entries use the same hash-chain mechanism as query audit entries, but the chain is independent (keyed by org_id).


Querying Audit Logs via SQL

Audit logs are stored in your S3 bucket as JSONL files partitioned by date. They can be queried directly through the HatiData proxy using the _hatidata_audit system schema:

-- Count queries by agent in the last 7 days
SELECT
agent_id,
agent_framework,
COUNT(*) AS query_count,
SUM(cost_credits) AS total_credits,
AVG(execution_time_ms) AS avg_latency_ms
FROM _hatidata_audit.queries
WHERE timestamp >= NOW() - INTERVAL '7 days'
GROUP BY agent_id, agent_framework
ORDER BY query_count DESC;
-- Find all policy changes in the last 30 days
SELECT
timestamp,
actor_id,
action,
resource_id,
before,
after
FROM _hatidata_audit.iam_events
WHERE timestamp >= NOW() - INTERVAL '30 days'
AND action LIKE 'policy.%'
ORDER BY timestamp DESC;
-- Verify any queries that accessed the customers table without masking
SELECT
query_id,
user_id,
agent_id,
sql_redacted,
columns_masked
FROM _hatidata_audit.queries
WHERE 'customers' = ANY(tables_accessed)
AND NOT ('email' = ANY(columns_masked))
AND timestamp >= NOW() - INTERVAL '90 days';

Retention Policies

TierHot StorageWarm StorageCold ArchiveTotal Retention
Cloud90 days (S3 Standard)----90 days
Growth90 days (S3 Standard)1 year (S3 Glacier)--1 year
Enterprise90 days (S3 Standard)1 year (S3 Glacier)7 years (S3 Deep Archive)7 years

S3 Object Lock is enabled in Governance mode for all tiers. Within the retention period, no user — including the bucket owner — can delete or overwrite audit entries without explicitly removing the Object Lock hold, which itself generates an audited event.

Custom retention periods are configurable for Enterprise customers. A minimum of 7 years is recommended for HIPAA and SOC 2 Type II compliance.


Export Formats

Audit logs stored in S3 are compatible with common SIEM and log management platforms:

FormatDetails
JSONLDefault. One JSON object per line, partitioned by year/month/day/.
ParquetAvailable for large-scale analytics. Generated nightly via a scheduled export job.
CSVAvailable via the control plane export API for date ranges up to 90 days.

SIEM Integrations

SIEMIntegration Method
SplunkS3 input add-on, reads JSONL from partitioned prefix
DatadogS3 log forwarding, auto-parsed JSON fields
Elastic / OpenSearchS3 input via Logstash or Beats S3 input plugin
Sumo LogicS3 source, automatic field extraction from JSONL
Microsoft SentinelAzure Blob / S3 connector

Export via API

# Export query audit for a specific date range (returns JSONL)
curl "https://api.hatidata.com/v1/audit/export?start=2026-02-01&end=2026-02-25&type=queries" \
-H "Authorization: Bearer <jwt>" \
--output audit-feb-2026.jsonl

Immutability Guarantees

PropertyMechanism
Append-only writesS3 Object Lock (Governance mode). No PUT-overwrite or DELETE within retention period.
Tamper detectionCryptographic hash chaining. Any modification breaks the chain and is detectable via the verify API.
PII protectionRedaction applied before write. Raw SQL containing PII is never stored.
No HatiData accessAudit logs are written to the customer's S3 bucket. HatiData infrastructure has no read or write access to the bucket.

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.