Data Protection
HatiData protects sensitive data through four complementary mechanisms: column masking, row-level security, PII redaction in audit logs, and tenant isolation. These protections operate at different stages of the query pipeline and can be combined for defense-in-depth.
Column Masking
Column masking applies dynamic redaction to query results after execution at the proxy layer. The underlying data is never modified -- masking is applied on the wire before results reach the client.
Masking Functions
| Function | Description | Input | Output |
|---|---|---|---|
| Full | Replace entire value with *** | alice@example.com | *** |
| Partial | Show last N characters | 4111111111111111 | ***1111 |
| Hash | SHA-256 digest of the value | alice@example.com | a1b2c3d4e5f6... |
| Null | Replace with SQL NULL | 555-0100 | NULL |
Creating a Masking Policy
curl -X POST https://api.hatidata.com/v1/environments/env_abc/policies \
-H "Authorization: Bearer <jwt>" \
-H "Content-Type: application/json" \
-d '{
"name": "pii-masking",
"type": "column_masking",
"rules": [
{
"table": "customers",
"column": "email",
"function": "full",
"exempt_roles": ["owner", "admin"]
},
{
"table": "customers",
"column": "phone",
"function": "null",
"exempt_roles": ["owner"]
},
{
"table": "payments",
"column": "card_number",
"function": "partial",
"visible_chars": 4,
"exempt_roles": []
},
{
"table": "customers",
"column": "ssn",
"function": "hash",
"exempt_roles": []
}
]
}'
Role-Based Exemptions
Each masking rule specifies which roles can see the unmasked value. A query from an analyst role would see:
SELECT name, email, phone, ssn FROM customers;
| name | phone | ssn | |
|---|---|---|---|
| Alice | *** | NULL | a1b2c3d4... |
| Bob | *** | NULL | e5f6g7h8... |
The same query from an owner role:
| name | phone | ssn | |
|---|---|---|---|
| Alice | alice@example.com | 555-0100 | a1b2c3d4... |
| Bob | bob@example.com | 555-0200 | e5f6g7h8... |
Notice that ssn has no exempt roles -- it is always hashed, even for owners.
Agent-Specific Masking Rules
Masking rules can target specific agents or agent frameworks:
{
"table": "customers",
"column": "email",
"function": "hash",
"apply_to_agents": true,
"agent_framework_exempt": [],
"agent_id_exempt": ["agent-email-sender"]
}
This ensures that AI agents see hashed email addresses by default, with exceptions for agents that genuinely need the raw value (such as an email-sending agent).
How Masking Works in the Pipeline
SQL Query → Parse → Transpile → DuckDB Execute → Column Masking → Wire Protocol → Client
│
┌──────────┤
│ For each column in result set:
│ 1. Check if masking rule exists
│ 2. Check if user role is exempt
│ 3. Check if agent is exempt
│ 4. Apply masking function
└──────────┘
Row-Level Security (RLS)
Row-level security restricts which rows a user or agent can see by injecting WHERE clauses into queries at the SQL AST level before execution.
How RLS Works
- The query is parsed into an Abstract Syntax Tree (AST)
- RLS policies matching the target tables are loaded
- WHERE clauses are injected into the AST
- The modified AST is rendered back to SQL and executed
Because injection happens at the AST level, RLS works for all query types: SELECT, UPDATE, DELETE, and subqueries.
Placeholder Variables
RLS filters use placeholder variables that are resolved from the authenticated session context:
| Placeholder | Source | Description |
|---|---|---|
{user_id} | JWT / API key | Authenticated user identity |
{org_id} | JWT / API key | Organization scope |
{agent_id} | Request header | AI agent identifier |
{agent_framework} | Request header | Agent framework name |
{department} | User attributes | Custom user attribute |
{region} | User attributes | Custom user attribute |
Creating an RLS Policy
curl -X POST https://api.hatidata.com/v1/environments/env_abc/policies \
-H "Authorization: Bearer <jwt>" \
-H "Content-Type: application/json" \
-d '{
"name": "department-isolation",
"type": "row_level_security",
"rules": [
{
"table": "sales_data",
"filter": "department = '\''{department}'\''",
"exempt_roles": ["owner", "admin"]
},
{
"table": "agent_logs",
"filter": "agent_id = '\''{agent_id}'\''",
"exempt_roles": ["owner", "admin", "auditor"]
}
]
}'
RLS in Action
When an analyst in the "engineering" department queries:
SELECT * FROM sales_data WHERE revenue > 10000;
HatiData injects the RLS filter, producing:
SELECT * FROM sales_data
WHERE revenue > 10000
AND department = 'engineering';
When an AI agent with agent_id = "data-analyst-v2" queries:
SELECT * FROM agent_logs;
The executed query becomes:
SELECT * FROM agent_logs
WHERE agent_id = 'data-analyst-v2';
Agent-Aware RLS
RLS is particularly important for multi-agent environments. Common patterns:
[
{
"table": "agent_memory",
"filter": "agent_id = '{agent_id}'",
"description": "Each agent can only access its own memory"
},
{
"table": "shared_context",
"filter": "agent_framework = '{agent_framework}' OR visibility = 'public'",
"description": "Agents see context from their framework plus public data"
},
{
"table": "customer_data",
"filter": "org_id = '{org_id}'",
"description": "Multi-tenant isolation"
}
]
PII Redaction in Audit Logs
All query audit entries undergo automatic PII redaction before being written to storage. This ensures that sensitive data does not leak into audit logs even when it appears in query text.
Detected Patterns
| Pattern | Example Match | Redacted Output |
|---|---|---|
alice@example.com | [EMAIL_REDACTED] | |
| SSN | 123-45-6789 | [SSN_REDACTED] |
| Credit Card | 4111111111111111 | [CC_REDACTED] |
| Phone | +1-555-0100 | [PHONE_REDACTED] |
Detection uses compiled regular expressions for microsecond-level performance. Redaction is applied to the SQL text in the audit entry, not to query results.
Example Audit Entry
{
"query_id": "qry_a1b2c3d4",
"user_id": "usr_x9y8z7",
"sql": "SELECT * FROM customers WHERE email = '[EMAIL_REDACTED]'",
"tables_accessed": ["customers"],
"rows_returned": 1,
"columns_masked": ["phone", "ssn"],
"execution_time_ms": 12,
"cache_hit": false,
"policy_verdicts": [
{ "policy": "pii-masking", "action": "mask", "columns": ["phone", "ssn"] }
],
"source_ip": "10.0.1.50",
"timestamp": "2026-02-16T10:30:00Z"
}
Tenant Isolation
In multi-tenant deployments, HatiData enforces strict data isolation between organizations.
Automatic WHERE Injection
Every query is automatically scoped to the authenticated organization:
-- User submits:
SELECT * FROM orders;
-- HatiData executes:
SELECT * FROM orders WHERE org_id = 'org_abc123';
This injection is automatic and cannot be bypassed. It operates at the AST level, covering all query types including subqueries and CTEs.
Cross-Tenant JOIN Prevention
HatiData detects and blocks queries that attempt to join data across tenant boundaries:
-- This query is BLOCKED:
SELECT a.*, b.*
FROM org_abc.orders a
JOIN org_xyz.orders b ON a.product_id = b.product_id;
{
"error": "cross_tenant_join",
"message": "Queries cannot join tables across organization boundaries",
"code": "SECURITY_VIOLATION"
}
Parent-Child Organization Hierarchy
Organizations can form parent-child hierarchies. A parent organization can optionally grant read access to child organization data:
Parent Org (Acme Corp)
├── Child Org (Acme US)
├── Child Org (Acme EU)
└── Child Org (Acme APAC)
Parent org users with appropriate permissions can query across child organizations. Child org users are always isolated to their own data.
Per-Tenant Resource Quotas
Each tenant has configurable resource quotas:
{
"org_id": "org_abc123",
"monthly_credit_limit": 10000,
"max_concurrent_queries": 50,
"max_rows_per_query": 1000000,
"storage_limit_gb": 500
}
Quota enforcement is applied at the proxy layer before query execution.
Combining Protections
These mechanisms layer together for comprehensive protection:
Query → Tenant Isolation (WHERE org_id = ...)
→ Row-Level Security (WHERE department = ...)
→ ABAC Policy Check (time, role, agent)
→ Execute
→ Column Masking (email → ***, phone → NULL)
→ PII Redaction in Audit Log
→ Return Results
Next Steps
- Security Overview -- Architecture and encryption
- Authorization -- RBAC and ABAC policies
- Audit API -- Query and IAM audit endpoints