Skip to main content

SCIM Directory Sync

HatiData supports SCIM (System for Cross-domain Identity Management) directory sync through Clerk. When configured, user accounts are automatically created, updated, and deactivated in HatiData based on changes in your identity provider's directory (Okta, Azure AD, Google Workspace, etc.).

How Directory Sync Works

Identity Provider (Okta, Azure AD)

├── User created in IdP ──▶ Clerk webhook ──▶ HatiData: create user
├── User updated in IdP ──▶ Clerk webhook ──▶ HatiData: update user
├── User deactivated ──▶ Clerk webhook ──▶ HatiData: deactivate user
└── Group membership ──▶ Clerk webhook ──▶ HatiData: update roles

Clerk normalizes SCIM events from all supported identity providers into a consistent webhook format. HatiData processes these webhooks to keep its user directory in sync.


Prerequisites

  • Clerk account with Directory Sync enabled
  • HatiData Enterprise tier
  • SSO already configured (see Clerk SSO Setup)
  • Admin access to your identity provider

Step 1: Enable Directory Sync in Clerk

Create a Directory Connection

  1. In the Clerk dashboard, navigate to Directory Sync
  2. Select the organization
  3. Choose the identity provider type (Okta SCIM, Azure AD SCIM, Google Workspace)
  4. Follow the provider-specific setup instructions

Configure the SCIM Endpoint (Okta Example)

In Okta:

  1. Open your SAML application
  2. Navigate to Provisioning > Configure API Integration
  3. Enter the SCIM Connector Base URL provided by Clerk
  4. Enter the API Token provided by Clerk
  5. Enable the provisioning features:
    • Create Users
    • Update User Attributes
    • Deactivate Users
    • Sync Groups (Push Groups)

Step 2: Configure Webhook Delivery

Clerk sends directory events to the HatiData control plane. Set the webhook endpoint:

HATIDATA_CLERK_WEBHOOK_SECRET=whsec_your_clerk_webhook_secret

The control plane automatically registers a webhook endpoint at:

POST https://api.hatidata.com/v1/webhooks/clerk

Webhook Event Types

EventAction in HatiData
dsync.user.createdCreate user account with default role
dsync.user.updatedUpdate user name, email, and attributes
dsync.user.deletedDeactivate user (see deactivation flow below)
dsync.group.createdCreate role mapping
dsync.group.updatedUpdate role mapping
dsync.group.deletedRemove role mapping
dsync.group.user_addedAssign role to user
dsync.group.user_removedRemove role from user

Step 3: Configure User Provisioning

Map directory attributes to HatiData user fields:

from hatidata import HatiDataClient

admin = HatiDataClient(
host="localhost",
port=5439,
api_key="hd_live_admin_key",
)

admin.organizations.configure_directory_sync(
org_id="org_acme",
clerk_directory_id="directory_01HXYZ...",
attribute_mapping={
"email": "emails[0].value",
"first_name": "name.givenName",
"last_name": "name.familyName",
"department": "urn:ietf:params:scim:schemas:extension:enterprise:2.0:User:department",
},
default_role="viewer",
auto_activate=True,
)

Provisioning Options

OptionDefaultDescription
default_roleviewerRole assigned to new users
auto_activatetrueActivate users immediately on creation
send_welcome_emailtrueSend onboarding email to new users
sync_intervalreal_timereal_time (webhook) or hourly (polling)

Step 4: Configure Group-to-Role Mapping

Map IdP groups to HatiData roles for automatic RBAC:

admin.organizations.configure_group_mapping(
org_id="org_acme",
mappings=[
{
"idp_group_name": "HatiData Admins",
"hatidata_role": "admin",
},
{
"idp_group_name": "HatiData Editors",
"hatidata_role": "editor",
},
{
"idp_group_name": "HatiData Viewers",
"hatidata_role": "viewer",
},
{
"idp_group_name": "Data Scientists",
"hatidata_role": "editor",
"additional_permissions": ["branch_create", "memory_write"],
},
],
)

When a user is added to or removed from an IdP group, the corresponding HatiData role is updated automatically via the SCIM webhook.


Step 5: Deactivation Flows

When a user is removed from the directory (e.g., offboarded), HatiData handles deactivation in a controlled manner:

Deactivation Steps

  1. Immediate: Dashboard access is revoked. Active sessions are invalidated.
  2. 24-hour grace: Agent API keys created by the user continue to work for 24 hours to prevent pipeline disruptions.
  3. After 24 hours: Agent API keys are rotated. The key names are preserved but secrets are regenerated.
  4. Data retention: The user's audit trail and CoT logs are preserved indefinitely for compliance.
# Configure deactivation behavior
admin.organizations.configure_deactivation(
org_id="org_acme",
api_key_grace_period_hours=24, # Grace period for agent API keys
preserve_audit_data=True, # Keep audit logs after deactivation
notify_admin_on_deactivation=True,
transfer_agent_keys_to="admin@acme.com", # Reassign orphaned keys
)

Monitoring Deactivations

-- Recently deactivated users
SELECT
user_id,
email,
deactivated_at,
deactivation_source,
api_keys_affected
FROM _hatidata_user_events
WHERE event_type = 'user_deactivated'
AND deactivated_at > NOW() - INTERVAL '30 days'
ORDER BY deactivated_at DESC;

Step 6: Verify Directory Sync

Check the sync status and recent events:

# Check directory sync status
curl https://api.hatidata.com/v1/organizations/org_acme/directory-sync/status \
-H "Authorization: Bearer <admin_jwt>"
{
"directory_sync_enabled": true,
"provider": "okta",
"connection_state": "active",
"last_sync": "2025-12-15T10:30:00Z",
"total_synced_users": 142,
"total_synced_groups": 8,
"pending_events": 0
}

Audit Directory Sync Events

SELECT
event_type,
user_email,
group_name,
action_taken,
created_at
FROM _hatidata_directory_sync_events
WHERE org_id = 'org_acme'
ORDER BY created_at DESC
LIMIT 20;

Troubleshooting

SymptomCauseFix
Users not syncingWebhook endpoint not reachableVerify HATIDATA_CLERK_WEBHOOK_SECRET is set
Group membership not updatingGroup push not enabled in IdPEnable "Push Groups" in Okta/Azure AD
Deactivated user can still accessGrace period activeWait 24 hours or revoke manually
Role not assignedGroup name mismatchCheck configure_group_mapping group names match IdP exactly

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.