Audit Guarantees

HatiData maintains two independent, append-only audit trails: a query audit log for every query executed through the proxy, and an IAM audit log for all administrative actions (policy changes, key rotations, user management). Both trails are cryptographic hash-chained, making any tampering or deletion detectable by recomputing the chain.

Hash-Chain Architecture

Every audit entry contains the cryptographic hash of the previous entry in the same session or log stream:

Entry N-1 → hash(Entry N-1) → stored as Entry N's prev_hash
Entry N   → hash(Entry N)   → stored as Entry N+1's prev_hash

This creates a cryptographic chain: if any entry is modified or deleted, all subsequent prev_hash values become invalid. The chain can be verified at any time via the audit API or by re-hashing the log entries from your S3 bucket.

Tamper Detection

Verify chain integrity for a session:

curl https://api.hatidata.com/v1/audit/sessions/{session_id}/verify \
  -H "Authorization: Bearer <jwt>"

{
  "session_id": "sess_a1b2c3",
  "entry_count": 847,
  "chain_valid": true,
  "first_entry_hash": "3a4f9c...",
  "last_entry_hash": "d82e1b...",
  "verified_at": "2026-02-25T09:14:00Z"
}

If chain_valid is false, the response includes the index and hash of the first broken link, enabling forensic investigation.

Query Audit Log

Every query executed through the HatiData proxy generates a query audit entry. Entries are written after execution completes, ensuring PII redaction runs before the entry is committed to storage.

Audit Entry Schema

{
  "entry_id": "aud_q1r2s3t4",
  "prev_hash": "3a4f9c8d...",
  "entry_hash": "d82e1b5f...",
  "timestamp": "2026-02-25T09:14:32.018Z",
  "query_id": "qry_a1b2c3d4",
  "session_id": "sess_a1b2c3",
  "user_id": "usr_x9y8z7",
  "agent_id": "agent-data-analyst-v2",
  "agent_framework": "langchain",
  "org_id": "org_abc123",
  "environment": "production",
  "sql": "SELECT name, email FROM customers WHERE created_at > '2026-01-01'",
  "sql_redacted": "SELECT name, email FROM customers WHERE created_at > '2026-01-01'",
  "tables_accessed": ["customers"],
  "rows_returned": 1842,
  "columns_masked": ["email"],
  "execution_time_ms": 34,
  "cache_hit": false,
  "cost_credits": 0.12,
  "policy_verdicts": [
    { "policy": "pii-masking", "action": "mask", "columns": ["email"] }
  ],
  "rls_filters_applied": ["department = 'engineering'"],
  "source_ip": "10.0.1.50",
  "query_origin": "agent"
}

PII Redaction Before Storage

Audit entries undergo automatic PII redaction before being written to S3. The raw SQL text is scanned for the following patterns using compiled regular expressions:

Pattern	Example	Redacted
Email address	`alice@example.com`	`[EMAIL_REDACTED]`
US Social Security Number	`123-45-6789`	`[SSN_REDACTED]`
Payment card number	`4111111111111111`	`[CC_REDACTED]`
Phone number	`+1-555-0100`	`[PHONE_REDACTED]`

Redaction is applied to the sql_redacted field. The original SQL is never stored. This ensures that even if audit storage were compromised, PII in query text would not be exposed.

IAM Audit Log

All administrative actions are recorded in a separate IAM audit trail. This trail captures policy CRUD, key rotation, user management, and SSO configuration changes with before/after values:

{
  "entry_id": "aud_i1a2m3",
  "prev_hash": "c91f3a...",
  "entry_hash": "7b44de...",
  "timestamp": "2026-02-25T08:00:12.004Z",
  "action": "policy.update",
  "actor_id": "usr_admin01",
  "actor_role": "admin",
  "org_id": "org_abc123",
  "resource_type": "policy",
  "resource_id": "pol_pii_masking",
  "before": { "enabled": false },
  "after": { "enabled": true },
  "source_ip": "10.0.0.5"
}

IAM audit entries use the same hash-chain mechanism as query audit entries, but the chain is independent (keyed by org_id).

Querying Audit Logs via SQL

Audit logs are stored in your S3 bucket as JSONL files partitioned by date. They can be queried directly through the HatiData proxy using the _hatidata_audit system schema:

-- Count queries by agent in the last 7 days
SELECT
  agent_id,
  agent_framework,
  COUNT(*) AS query_count,
  SUM(cost_credits) AS total_credits,
  AVG(execution_time_ms) AS avg_latency_ms
FROM _hatidata_audit.queries
WHERE timestamp >= NOW() - INTERVAL '7 days'
GROUP BY agent_id, agent_framework
ORDER BY query_count DESC;

-- Find all policy changes in the last 30 days
SELECT
  timestamp,
  actor_id,
  action,
  resource_id,
  before,
  after
FROM _hatidata_audit.iam_events
WHERE timestamp >= NOW() - INTERVAL '30 days'
  AND action LIKE 'policy.%'
ORDER BY timestamp DESC;

-- Verify any queries that accessed the customers table without masking
SELECT
  query_id,
  user_id,
  agent_id,
  sql_redacted,
  columns_masked
FROM _hatidata_audit.queries
WHERE 'customers' = ANY(tables_accessed)
  AND NOT ('email' = ANY(columns_masked))
  AND timestamp >= NOW() - INTERVAL '90 days';

Retention Policies

Tier	Hot Storage	Warm Storage	Cold Archive	Total Retention
Cloud	90 days (S3 Standard)	--	--	90 days
Growth	90 days (S3 Standard)	1 year (S3 Glacier)	--	1 year
Enterprise	90 days (S3 Standard)	1 year (S3 Glacier)	7 years (S3 Deep Archive)	7 years

S3 Object Lock is enabled in Governance mode for all tiers. Within the retention period, no user — including the bucket owner — can delete or overwrite audit entries without explicitly removing the Object Lock hold, which itself generates an audited event.

Custom retention periods are configurable for Enterprise customers. A minimum of 7 years is recommended for HIPAA and SOC 2 Type II compliance.

Export Formats

Audit logs stored in S3 are compatible with common SIEM and log management platforms:

Format	Details
JSONL	Default. One JSON object per line, partitioned by `year/month/day/`.
Parquet	Available for large-scale analytics. Generated nightly via a scheduled export job.
CSV	Available via the control plane export API for date ranges up to 90 days.

SIEM Integrations

SIEM	Integration Method
Splunk	S3 input add-on, reads JSONL from partitioned prefix
Datadog	S3 log forwarding, auto-parsed JSON fields
Elastic / OpenSearch	S3 input via Logstash or Beats S3 input plugin
Sumo Logic	S3 source, automatic field extraction from JSONL
Microsoft Sentinel	Azure Blob / S3 connector

Export via API

# Export query audit for a specific date range (returns JSONL)
curl "https://api.hatidata.com/v1/audit/export?start=2026-02-01&end=2026-02-25&type=queries" \
  -H "Authorization: Bearer <jwt>" \
  --output audit-feb-2026.jsonl

Immutability Guarantees

Property	Mechanism
Append-only writes	S3 Object Lock (Governance mode). No PUT-overwrite or DELETE within retention period.
Tamper detection	Cryptographic hash chaining. Any modification breaks the chain and is detectable via the verify API.
PII protection	Redaction applied before write. Raw SQL containing PII is never stored.
No HatiData access	Audit logs are written to the customer's S3 bucket. HatiData infrastructure has no read or write access to the bucket.

SOC 2 Architecture — How audit controls map to Trust Service Criteria
Data Residency — Where audit logs are stored geographically
CMEK & Encryption — Encryption applied to audit log storage
Security Model — PII redaction in query results and audit entries
Control Plane API — Programmatic access to audit entries and chain verification

Hash-Chain Architecture​

Tamper Detection​

Query Audit Log​

Audit Entry Schema​

PII Redaction Before Storage​

IAM Audit Log​

Querying Audit Logs via SQL​

Retention Policies​

Export Formats​

SIEM Integrations​

Export via API​

Immutability Guarantees​

Related Concepts​

Stay in the loop