Data Protection

HatiData protects sensitive data through four complementary mechanisms: column masking, row-level security, PII redaction in audit logs, and tenant isolation. These protections operate at different stages of the query pipeline and can be combined for defense-in-depth.

Column Masking

Column masking applies dynamic redaction to query results after execution at the proxy layer. The underlying data is never modified -- masking is applied on the wire before results reach the client.

Masking Functions

Function	Description	Input	Output
Full	Replace entire value with `***`	`alice@example.com`	`***`
Partial	Show last N characters	`4111111111111111`	`***1111`
Hash	SHA-256 digest of the value	`alice@example.com`	`a1b2c3d4e5f6...`
Null	Replace with SQL NULL	`555-0100`	`NULL`

Creating a Masking Policy

curl -X POST https://api.hatidata.com/v1/environments/env_abc/policies \
  -H "Authorization: Bearer <jwt>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "pii-masking",
    "type": "column_masking",
    "rules": [
      {
        "table": "customers",
        "column": "email",
        "function": "full",
        "exempt_roles": ["owner", "admin"]
      },
      {
        "table": "customers",
        "column": "phone",
        "function": "null",
        "exempt_roles": ["owner"]
      },
      {
        "table": "payments",
        "column": "card_number",
        "function": "partial",
        "visible_chars": 4,
        "exempt_roles": []
      },
      {
        "table": "customers",
        "column": "ssn",
        "function": "hash",
        "exempt_roles": []
      }
    ]
  }'

Role-Based Exemptions

Each masking rule specifies which roles can see the unmasked value. A query from an analyst role would see:

SELECT name, email, phone, ssn FROM customers;

name	email	phone	ssn
Alice	`***`	`NULL`	`a1b2c3d4...`
Bob	`***`	`NULL`	`e5f6g7h8...`

The same query from an owner role:

name	email	phone	ssn
Alice	alice@example.com	555-0100	`a1b2c3d4...`
Bob	bob@example.com	555-0200	`e5f6g7h8...`

Notice that ssn has no exempt roles -- it is always hashed, even for owners.

Agent-Specific Masking Rules

Masking rules can target specific agents or agent frameworks:

{
  "table": "customers",
  "column": "email",
  "function": "hash",
  "apply_to_agents": true,
  "agent_framework_exempt": [],
  "agent_id_exempt": ["agent-email-sender"]
}

This ensures that AI agents see hashed email addresses by default, with exceptions for agents that genuinely need the raw value (such as an email-sending agent).

How Masking Works in the Pipeline

SQL Query → Parse → Transpile → DuckDB Execute → Column Masking → Wire Protocol → Client
                                                       │
                                            ┌──────────┤
                                            │ For each column in result set:
                                            │   1. Check if masking rule exists
                                            │   2. Check if user role is exempt
                                            │   3. Check if agent is exempt
                                            │   4. Apply masking function
                                            └──────────┘

Row-Level Security (RLS)

Row-level security restricts which rows a user or agent can see by injecting WHERE clauses into queries at the SQL AST level before execution.

How RLS Works

The query is parsed into an Abstract Syntax Tree (AST)
RLS policies matching the target tables are loaded
WHERE clauses are injected into the AST
The modified AST is rendered back to SQL and executed

Because injection happens at the AST level, RLS works for all query types: SELECT, UPDATE, DELETE, and subqueries.

Placeholder Variables

RLS filters use placeholder variables that are resolved from the authenticated session context:

Placeholder	Source	Description
`{user_id}`	JWT / API key	Authenticated user identity
`{org_id}`	JWT / API key	Organization scope
`{agent_id}`	Request header	AI agent identifier
`{agent_framework}`	Request header	Agent framework name
`{department}`	User attributes	Custom user attribute
`{region}`	User attributes	Custom user attribute

Creating an RLS Policy

curl -X POST https://api.hatidata.com/v1/environments/env_abc/policies \
  -H "Authorization: Bearer <jwt>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "department-isolation",
    "type": "row_level_security",
    "rules": [
      {
        "table": "sales_data",
        "filter": "department = '\''{department}'\''",
        "exempt_roles": ["owner", "admin"]
      },
      {
        "table": "agent_logs",
        "filter": "agent_id = '\''{agent_id}'\''",
        "exempt_roles": ["owner", "admin", "auditor"]
      }
    ]
  }'

RLS in Action

When an analyst in the "engineering" department queries:

SELECT * FROM sales_data WHERE revenue > 10000;

HatiData injects the RLS filter, producing:

SELECT * FROM sales_data
WHERE revenue > 10000
  AND department = 'engineering';

When an AI agent with agent_id = "data-analyst-v2" queries:

SELECT * FROM agent_logs;

The executed query becomes:

SELECT * FROM agent_logs
WHERE agent_id = 'data-analyst-v2';

Agent-Aware RLS

RLS is particularly important for multi-agent environments. Common patterns:

[
  {
    "table": "agent_memory",
    "filter": "agent_id = '{agent_id}'",
    "description": "Each agent can only access its own memory"
  },
  {
    "table": "shared_context",
    "filter": "agent_framework = '{agent_framework}' OR visibility = 'public'",
    "description": "Agents see context from their framework plus public data"
  },
  {
    "table": "customer_data",
    "filter": "org_id = '{org_id}'",
    "description": "Multi-tenant isolation"
  }
]

PII Redaction in Audit Logs

All query audit entries undergo automatic PII redaction before being written to storage. This ensures that sensitive data does not leak into audit logs even when it appears in query text.

Detected Patterns

Pattern	Example Match	Redacted Output
Email	`alice@example.com`	`[EMAIL_REDACTED]`
SSN	`123-45-6789`	`[SSN_REDACTED]`
Credit Card	`4111111111111111`	`[CC_REDACTED]`
Phone	`+1-555-0100`	`[PHONE_REDACTED]`

Detection uses compiled regular expressions for microsecond-level performance. Redaction is applied to the SQL text in the audit entry, not to query results.

Example Audit Entry

{
  "query_id": "qry_a1b2c3d4",
  "user_id": "usr_x9y8z7",
  "sql": "SELECT * FROM customers WHERE email = '[EMAIL_REDACTED]'",
  "tables_accessed": ["customers"],
  "rows_returned": 1,
  "columns_masked": ["phone", "ssn"],
  "execution_time_ms": 12,
  "cache_hit": false,
  "policy_verdicts": [
    { "policy": "pii-masking", "action": "mask", "columns": ["phone", "ssn"] }
  ],
  "source_ip": "10.0.1.50",
  "timestamp": "2026-02-16T10:30:00Z"
}

Tenant Isolation

In multi-tenant deployments, HatiData enforces strict data isolation between organizations.

Automatic WHERE Injection

Every query is automatically scoped to the authenticated organization:

-- User submits:
SELECT * FROM orders;

-- HatiData executes:
SELECT * FROM orders WHERE org_id = 'org_abc123';

This injection is automatic and cannot be bypassed. It operates at the AST level, covering all query types including subqueries and CTEs.

Cross-Tenant JOIN Prevention

HatiData detects and blocks queries that attempt to join data across tenant boundaries:

-- This query is BLOCKED:
SELECT a.*, b.*
FROM org_abc.orders a
JOIN org_xyz.orders b ON a.product_id = b.product_id;

{
  "error": "cross_tenant_join",
  "message": "Queries cannot join tables across organization boundaries",
  "code": "SECURITY_VIOLATION"
}

Parent-Child Organization Hierarchy

Organizations can form parent-child hierarchies. A parent organization can optionally grant read access to child organization data:

Parent Org (Acme Corp)
├── Child Org (Acme US)
├── Child Org (Acme EU)
└── Child Org (Acme APAC)

Parent org users with appropriate permissions can query across child organizations. Child org users are always isolated to their own data.

Per-Tenant Resource Quotas

Each tenant has configurable resource quotas:

{
  "org_id": "org_abc123",
  "monthly_credit_limit": 10000,
  "max_concurrent_queries": 50,
  "max_rows_per_query": 1000000,
  "storage_limit_gb": 500
}

Quota enforcement is applied at the proxy layer before query execution.

Combining Protections

These mechanisms layer together for comprehensive protection:

Query → Tenant Isolation (WHERE org_id = ...)
      → Row-Level Security (WHERE department = ...)
      → ABAC Policy Check (time, role, agent)
      → Execute
      → Column Masking (email → ***, phone → NULL)
      → PII Redaction in Audit Log
      → Return Results

Next Steps

Security Overview -- Architecture and encryption
Authorization -- RBAC and ABAC policies
Audit API -- Query and IAM audit endpoints

Column Masking​

Masking Functions​

Creating a Masking Policy​

Role-Based Exemptions​

Agent-Specific Masking Rules​

How Masking Works in the Pipeline​

Row-Level Security (RLS)​

How RLS Works​

Placeholder Variables​

Creating an RLS Policy​

RLS in Action​

Agent-Aware RLS​

PII Redaction in Audit Logs​

Detected Patterns​

Example Audit Entry​

Tenant Isolation​

Automatic WHERE Injection​

Cross-Tenant JOIN Prevention​

Parent-Child Organization Hierarchy​

Per-Tenant Resource Quotas​

Combining Protections​

Next Steps​

Stay in the loop