Skip to main content

Data Protection

HatiData protects sensitive data through four complementary mechanisms: column masking, row-level security, PII redaction in audit logs, and tenant isolation. These protections operate at different stages of the query pipeline and can be combined for defense-in-depth.

Column Masking

Column masking applies dynamic redaction to query results after execution at the proxy layer. The underlying data is never modified -- masking is applied on the wire before results reach the client.

Masking Functions

FunctionDescriptionInputOutput
FullReplace entire value with ***alice@example.com***
PartialShow last N characters4111111111111111***1111
HashSHA-256 digest of the valuealice@example.coma1b2c3d4e5f6...
NullReplace with SQL NULL555-0100NULL

Creating a Masking Policy

curl -X POST https://api.hatidata.com/v1/environments/env_abc/policies \
-H "Authorization: Bearer <jwt>" \
-H "Content-Type: application/json" \
-d '{
"name": "pii-masking",
"type": "column_masking",
"rules": [
{
"table": "customers",
"column": "email",
"function": "full",
"exempt_roles": ["owner", "admin"]
},
{
"table": "customers",
"column": "phone",
"function": "null",
"exempt_roles": ["owner"]
},
{
"table": "payments",
"column": "card_number",
"function": "partial",
"visible_chars": 4,
"exempt_roles": []
},
{
"table": "customers",
"column": "ssn",
"function": "hash",
"exempt_roles": []
}
]
}'

Role-Based Exemptions

Each masking rule specifies which roles can see the unmasked value. A query from an analyst role would see:

SELECT name, email, phone, ssn FROM customers;
nameemailphonessn
Alice***NULLa1b2c3d4...
Bob***NULLe5f6g7h8...

The same query from an owner role:

nameemailphonessn
Alicealice@example.com555-0100a1b2c3d4...
Bobbob@example.com555-0200e5f6g7h8...

Notice that ssn has no exempt roles -- it is always hashed, even for owners.

Agent-Specific Masking Rules

Masking rules can target specific agents or agent frameworks:

{
"table": "customers",
"column": "email",
"function": "hash",
"apply_to_agents": true,
"agent_framework_exempt": [],
"agent_id_exempt": ["agent-email-sender"]
}

This ensures that AI agents see hashed email addresses by default, with exceptions for agents that genuinely need the raw value (such as an email-sending agent).

How Masking Works in the Pipeline

SQL Query → Parse → Transpile → DuckDB Execute → Column Masking → Wire Protocol → Client

┌──────────┤
│ For each column in result set:
│ 1. Check if masking rule exists
│ 2. Check if user role is exempt
│ 3. Check if agent is exempt
│ 4. Apply masking function
└──────────┘

Row-Level Security (RLS)

Row-level security restricts which rows a user or agent can see by injecting WHERE clauses into queries at the SQL AST level before execution.

How RLS Works

  1. The query is parsed into an Abstract Syntax Tree (AST)
  2. RLS policies matching the target tables are loaded
  3. WHERE clauses are injected into the AST
  4. The modified AST is rendered back to SQL and executed

Because injection happens at the AST level, RLS works for all query types: SELECT, UPDATE, DELETE, and subqueries.

Placeholder Variables

RLS filters use placeholder variables that are resolved from the authenticated session context:

PlaceholderSourceDescription
{user_id}JWT / API keyAuthenticated user identity
{org_id}JWT / API keyOrganization scope
{agent_id}Request headerAI agent identifier
{agent_framework}Request headerAgent framework name
{department}User attributesCustom user attribute
{region}User attributesCustom user attribute

Creating an RLS Policy

curl -X POST https://api.hatidata.com/v1/environments/env_abc/policies \
-H "Authorization: Bearer <jwt>" \
-H "Content-Type: application/json" \
-d '{
"name": "department-isolation",
"type": "row_level_security",
"rules": [
{
"table": "sales_data",
"filter": "department = '\''{department}'\''",
"exempt_roles": ["owner", "admin"]
},
{
"table": "agent_logs",
"filter": "agent_id = '\''{agent_id}'\''",
"exempt_roles": ["owner", "admin", "auditor"]
}
]
}'

RLS in Action

When an analyst in the "engineering" department queries:

SELECT * FROM sales_data WHERE revenue > 10000;

HatiData injects the RLS filter, producing:

SELECT * FROM sales_data
WHERE revenue > 10000
AND department = 'engineering';

When an AI agent with agent_id = "data-analyst-v2" queries:

SELECT * FROM agent_logs;

The executed query becomes:

SELECT * FROM agent_logs
WHERE agent_id = 'data-analyst-v2';

Agent-Aware RLS

RLS is particularly important for multi-agent environments. Common patterns:

[
{
"table": "agent_memory",
"filter": "agent_id = '{agent_id}'",
"description": "Each agent can only access its own memory"
},
{
"table": "shared_context",
"filter": "agent_framework = '{agent_framework}' OR visibility = 'public'",
"description": "Agents see context from their framework plus public data"
},
{
"table": "customer_data",
"filter": "org_id = '{org_id}'",
"description": "Multi-tenant isolation"
}
]

PII Redaction in Audit Logs

All query audit entries undergo automatic PII redaction before being written to storage. This ensures that sensitive data does not leak into audit logs even when it appears in query text.

Detected Patterns

PatternExample MatchRedacted Output
Emailalice@example.com[EMAIL_REDACTED]
SSN123-45-6789[SSN_REDACTED]
Credit Card4111111111111111[CC_REDACTED]
Phone+1-555-0100[PHONE_REDACTED]

Detection uses compiled regular expressions for microsecond-level performance. Redaction is applied to the SQL text in the audit entry, not to query results.

Example Audit Entry

{
"query_id": "qry_a1b2c3d4",
"user_id": "usr_x9y8z7",
"sql": "SELECT * FROM customers WHERE email = '[EMAIL_REDACTED]'",
"tables_accessed": ["customers"],
"rows_returned": 1,
"columns_masked": ["phone", "ssn"],
"execution_time_ms": 12,
"cache_hit": false,
"policy_verdicts": [
{ "policy": "pii-masking", "action": "mask", "columns": ["phone", "ssn"] }
],
"source_ip": "10.0.1.50",
"timestamp": "2026-02-16T10:30:00Z"
}

Tenant Isolation

In multi-tenant deployments, HatiData enforces strict data isolation between organizations.

Automatic WHERE Injection

Every query is automatically scoped to the authenticated organization:

-- User submits:
SELECT * FROM orders;

-- HatiData executes:
SELECT * FROM orders WHERE org_id = 'org_abc123';

This injection is automatic and cannot be bypassed. It operates at the AST level, covering all query types including subqueries and CTEs.

Cross-Tenant JOIN Prevention

HatiData detects and blocks queries that attempt to join data across tenant boundaries:

-- This query is BLOCKED:
SELECT a.*, b.*
FROM org_abc.orders a
JOIN org_xyz.orders b ON a.product_id = b.product_id;
{
"error": "cross_tenant_join",
"message": "Queries cannot join tables across organization boundaries",
"code": "SECURITY_VIOLATION"
}

Parent-Child Organization Hierarchy

Organizations can form parent-child hierarchies. A parent organization can optionally grant read access to child organization data:

Parent Org (Acme Corp)
├── Child Org (Acme US)
├── Child Org (Acme EU)
└── Child Org (Acme APAC)

Parent org users with appropriate permissions can query across child organizations. Child org users are always isolated to their own data.

Per-Tenant Resource Quotas

Each tenant has configurable resource quotas:

{
"org_id": "org_abc123",
"monthly_credit_limit": 10000,
"max_concurrent_queries": 50,
"max_rows_per_query": 1000000,
"storage_limit_gb": 500
}

Quota enforcement is applied at the proxy layer before query execution.

Combining Protections

These mechanisms layer together for comprehensive protection:

Query → Tenant Isolation (WHERE org_id = ...)
→ Row-Level Security (WHERE department = ...)
→ ABAC Policy Check (time, role, agent)
→ Execute
→ Column Masking (email → ***, phone → NULL)
→ PII Redaction in Audit Log
→ Return Results

Next Steps

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.