CRDT State
Ahdapa replicates cluster state using Conflict-free Replicated Data Types (CRDTs). All replicated state lives in IdpCrdt (src/crdt/mod.rs). The design goal is that any node can merge state from any other node in any order and the result is the same — no coordination, no leader, no quorum.
IdpCrdt fields
#![allow(unused)]
fn main() {
pub struct IdpCrdt {
pub signing_keys: OrMap<String, SigningKeyEntry>,
pub active_kid: LwwRegister<String>,
pub wrapping_key_id: LwwRegister<String>,
pub cluster_nodes: OrMap<String, NodeEntry>,
pub clients: OrMap<String, ClientEntry>,
pub refresh_families: LwwMap<String, RefreshFamilyState>,
pub revoked_sessions: LwwMap<String, i64>,
pub scope_definitions: LwwMap<String, ScopeDefinition>,
pub hbac_rules: hbac_crdt::RuleSet,
pub ipa_idp_overrides: LwwMap<String, IpaIdpOverride>,
}
}
| Field | CRDT type | Semantics |
|---|---|---|
signing_keys | OR-Map | Signing key entries indexed by kid. Each entry carries public_key_der, algorithm (e.g. "ES256", "EdDSA", "ML-DSA-44"), and not_after. The private_key_der field is #[serde(skip_serializing, default)] — it is stored node-locally in node_keys and is never gossiped. Keys may be revoked (tombstoned) via DELETE /api/admin/keys/{kid}; the JWKS endpoint serves only live (non-tombstoned) keys via live_values(). Tombstones are GC-purged after tombstone_ttl_secs by the hourly housekeeping task. |
active_kid | LWW-Register | The kid most recently set as active by the local node’s key rotation. Not used for signing lookups — each node signs with its own local key regardless of this value. |
wrapping_key_id | LWW-Register | UUID string identifying the cluster AEAD wrapping key. The actual 32-byte key is stored node-locally in node_keys.wrapping_key_cms_der (CMS-sealed to the node’s own KEM key) and is never gossiped. Latest timestamp wins. |
cluster_nodes | OR-Map | Registered cluster nodes (node_id → certificate + public key). Soft deletes via tombstones. |
clients | OR-Map | OAuth2 client registrations. Soft deletes via tombstones. |
refresh_families | LWW-Map | Per-family max_index for refresh token rotation chain detection. |
revoked_sessions | LWW-Map | Per-subject session revocation timestamps (sub → revoked_at unix seconds). Populated on logout and back-channel logout when distributed_mode >= eventual. Any cluster node rejects session cookies whose iat is older than the stored revoked_at for that subject. Latest timestamp wins (concurrent revocations for the same subject converge to the most recent one). Entries older than session_ttl are purged periodically via purge_old_revocations. |
scope_definitions | LWW-Map | Scope-to-claim mappings (scope_name → ScopeDefinition). Each ScopeDefinition carries name, description, claims: Vec<String>, and is_system: bool. Seven built-in scopes (openid, offline_access, profile, email, phone, address, groups) are seeded on first startup with is_system = true and cannot be deleted. Custom scopes are created and deleted via the admin API; deletion sets is_tombstone = true in the LWW entry. The UserInfo endpoint resolves claim names against FullUserEntry first-class fields, then falls through to raw_attrs LDAP attributes for any unrecognised name. The discovery scopes_supported and claims_supported fields are rebuilt from this map on every request. |
hbac_rules | hbac_crdt::RuleSet | Identity HBAC policy rules. The RuleSet is an op-based CRDT from the crates/hbac-crdt/ crate. Rule existence is an RW-Set of RuleIds; rule content is stored per-RuleId as an HBACRule whose axes (users, clients, scopes, networks, device groups, MFA bypass, required ACR) each use a security-conservative CRDT primitive (RW-Set or DW-Register). This field is the gossip mirror — the authoritative mutable state lives in AppState.hbac_log (OpLog). On mutation, the op-log’s materialised RuleSet is copied here and persisted to crdt_hbac_rules. On inbound gossip merge, the received hbac_rules are merged into this field and then mirrored back into hbac_log.state. At startup, after merging persisted state into hbac_log.state, OpLog::restore_clock_from_state must be called to advance the local Lamport clock past all tags already in the persisted state; without this call the first new operation receives a timestamp that collides with existing tags, causing dedup_push to silently drop add-tags and leaving prior remove-tags permanently in effect. |
ipa_idp_overrides | LWW-Map | Per-IPA-IdP ACR/AMR overrides (ipa-<slug> → IpaIdpOverride). Each IpaIdpOverride carries default_acr: Option<String> and default_amr: Vec<String>. Stores only the two writable fields — all LDAP-sourced attributes (issuer URI, client ID, scopes, callback path) remain read-only and are never stored in the CRDT. Set via PUT /api/admin/federation/ipa-idps/{id}; applied at find_upstream() time by patching the in-memory UpstreamIdpConfig cloned from AppState.ipa_upstream_idps. Persisted in crdt_ipa_idp_overrides and gossiped to all nodes so overrides survive restarts and reach every cluster member. |
CRDT_GENERATION counter
A process-global AtomicU64 (CRDT_GENERATION in src/crdt/mod.rs) is incremented on
every mutation that actually changes CRDT state: new entries inserted via insert or
merge, tombstones applied, LWW values set when the incoming timestamp wins. When a
gossip round produces no net change (all merges are no-ops), the counter does not
advance.
The gossip loop tracks two per-peer generation maps:
-
peer_last_gen[peer]— the localCRDT_GENERATIONafter the last successful sync with this peer. Used for two purposes: (1) before each outbound push, ifCRDT_GENERATION.load()equalspeer_last_gen[peer], the CRDT has not changed and the push is skipped entirely; (2) when a push is needed,delta_since(peer_last_gen[peer])produces a sparse delta that contains only new entries, reducing payload size. -
peer_response_gen[peer]— the peer’sCRDT_GENERATIONreported in their last response envelope (GossipEnvelope.my_gen). This value is sent back to the peer in the next push asrequest_delta_sinceso the peer can construct a delta response rather than replying with its full state.
On any error (connection failure, non-2xx response, wrapping-key pull failure), both entries for that peer are cleared so the next round performs a full-state exchange and allows the lagging peer to catch up.
CRDT primitives
LwwRegister
Last-Write-Wins Register. The value with the higher timestamp wins. On equal timestamps, the node with the lexicographically greater node_id wins (deterministic tie-breaking).
#![allow(unused)]
fn main() {
impl<T> LwwRegister<T> {
pub fn set(&mut self, value: T, timestamp: i64, node_id: &str);
pub fn get(&self) -> Option<&T>;
pub fn merge(&mut self, other: LwwRegister<T>);
}
}
Used for active_kid and wrapping_key_id. Setting a value with an older timestamp is a no-op, making set idempotent.
OrMap
Observed-Remove Map. Supports soft deletes via tombstones. Merge semantics: the union of live entries, where any tombstone suppresses its entry on both sides.
#![allow(unused)]
fn main() {
impl<K, V> OrMap<K, V> {
pub fn insert(&mut self, key: K, value: V, timestamp: i64);
pub fn remove(&mut self, key: &K, timestamp: i64); // sets tombstone
pub fn upsert(&mut self, key: K, value: V, timestamp: i64); // for updates
pub fn get(&self, key: &K) -> Option<&V>; // None for tombstoned
pub fn live_values(&self) -> impl Iterator<Item = (&K, &V)>;
pub fn merge(&mut self, other: OrMap<K, V>);
pub fn purge_old_tombstones(&mut self, cutoff: i64); // drops tombstones older than cutoff
}
}
Used for cluster_nodes and clients. A tombstone wins over a live entry on merge —
deleting a client on any node will eventually suppress it everywhere.
remove records a tombstone even when the key is absent from the local map. This is
necessary to prevent entry resurrection on out-of-order gossip delivery: if a remove
arrives at a node before the corresponding insert (because gossip rounds fire in
different orders), the pre-emptive tombstone suppresses the subsequent insert when it
eventually arrives.
purge_old_tombstones(cutoff) permanently removes tombstoned entries whose
tombstone_at timestamp is older than cutoff. Called approximately once per hour with
cutoff = now - tombstone_ttl_secs. Entries that are still live (not tombstoned) are
never removed by this call.
LwwMap
A map where each key has an independent LWW-Register value.
#![allow(unused)]
fn main() {
impl<K, V> LwwMap<K, V> {
pub fn set(&mut self, key: K, value: V, timestamp: i64, node_id: &str);
pub fn get(&self, key: &K) -> Option<&V>;
pub fn merge(&mut self, other: LwwMap<K, V>);
pub fn retain<F: Fn(&V) -> bool>(&mut self, f: F); // remove entries where f returns false
}
}
Used for refresh_families. Each family_id key has its own LWW value (RefreshFamilyState
containing max_index and expires_at). The highest max_index seen propagates on
merge; setting max_index = u64::MAX is the revocation signal.
retain(f) removes entries where f(value) returns false. Used for expired-family
purge: retain(|s| s.expires_at > now) drops all expired families before each outbound
push and after each inbound merge, keeping the gossip payload bounded over time.
Persistence
IdpCrdt is persisted to the local database on every mutation and after every inbound gossip merge. The schema mirrors the CRDT structure exactly:
| Table | CRDT field |
|---|---|
crdt_signing_keys | signing_keys (OR-Map rows; tombstone + tombstone_at columns added in migration 0017_signing_key_tombstone.sql) |
crdt_active_kid | active_kid (single row keyed by id=1; INSERT OR REPLACE) |
crdt_wrapping_key | wrapping_key_id (single row keyed by id=1; stores UUID only; INSERT OR REPLACE) |
crdt_cluster_nodes | cluster_nodes (OR-Map rows with tombstone columns) |
crdt_clients | clients (OR-Map rows with tombstone columns) |
crdt_refresh_families | refresh_families (LWW-Map rows) |
crdt_revoked_sessions | revoked_sessions (LWW-Map rows: local_sub, revoked_at, set_by_node) |
crdt_scopes | scope_definitions (LWW-Map rows: name, description, claims JSON, is_system, set_at, set_by_node, is_deleted, deleted_at) |
crdt_hbac_rules | hbac_rules (single JSON blob row — the full serialised RuleSet) |
crdt_ipa_idp_overrides | ipa_idp_overrides (LWW-Map rows: id, default_acr, default_amr JSON, set_at, set_by_node, is_deleted, deleted_at; migration 0021_crdt_ipa_idp_overrides.sql) |
Three additional nullable columns were added to crdt_clients in migration 0022_client_kerberos.sql to support the kerberos_client_auth token endpoint authentication method:
| Column | Type | Purpose |
|---|---|---|
kerberos_principal | TEXT (nullable) | Exact Kerberos service principal for single-machine clients (e.g. host/node1.example.com@REALM). |
kerberos_principal_pattern | TEXT (nullable) | Glob pattern for template clients (e.g. host/*@REALM). * matches any characters except @. |
kerberos_hbac_service | TEXT (nullable) | FreeIPA HBAC service name that gates access via the replicated HBAC rule set. |
Exactly one of kerberos_principal or kerberos_principal_pattern is set per Kerberos client; all three columns are NULL for non-Kerberos clients.
At startup, IdpCrdt::load_from_db reconstructs all ten fields from the database. Revoked session entries older than revocation_cutoff (derived from session_ttl) are filtered at load time so a restarted node does not carry stale revocations. Built-in scope definitions are seeded into crdt_scopes on first startup if not already present.
Bootstrap
On a brand-new node with an empty database:
load_from_dbreturns an all-default (empty)IdpCrdt.bootstrap_node_kem_key()generates an ML-KEM-768 key pair and an ECDSA P-256 gossip signing key pair; stores all four DER values innode_keys.bootstrap_signing_key()generates a JWT signing key pair using the algorithm from[server] jwt_signing_algorithm(default: ES256), stores the private key innode_keys.jwt_signing_priv_der(never in CRDT), computeskid = base64url(SHA256(spki_der)[..8]), inserts aSigningKeyEntry(public key + algorithm only) intosigning_keys, and setsactive_kid. If an existing key innode_keysuses a different algorithm than the configured one, a new key is generated automatically (algorithm upgrade path).bootstrap_wrapping_key()checksnode_keys.wrapping_key_cms_der:- If present: decrypts with
open_raw()to recover the 32-byte key; restoreswrapping_key_idto the CRDT if not already set. - If absent: generates 32 random bytes, seals them with
seal_raw()to the node’s own KEM public key, stores the CMS blob innode_keys, generates a UUID, and publishes the UUID to the CRDT aswrapping_key_idwith timestamp=1.
- If present: decrypts with
persist_to_dbflushes to the database.
When gossip is enabled, the node receives the cluster’s existing CRDT state on the first gossip round and merges it. If the peer’s wrapping_key_id differs from the local UUID, the node pulls the actual key via GET /api/gossip/wrapping-key.
Key rotation
Rotating the signing key is an admin operation (POST /api/admin/keys/rotate):
- Generate a new key pair using the algorithm from
[server] jwt_signing_algorithm(default: ES256). - Compute
kid = base64url(SHA256(spki_der)[..8]). - Store the private key in
node_keys.jwt_signing_priv_der(replaces previous active key). The private key is never written tocrdt_signing_keys. - Insert a
SigningKeyEntrywith public key + algorithm intocrdt_signing_keys(OR-Mapinsert) and updatecrdt_active_kid(LWW INSERT OR REPLACE). - Write to the in-memory CRDT.
- Gossip propagates the new public key entry to all peers within ~2 gossip intervals.
Old keys remain in signing_keys and continue to validate tokens signed before the rotation until their not_after timestamp passes. The JWKS endpoint (/jwks) serves all live (non-tombstoned) keys. Any node that issued a token with a given kid holds the corresponding private key; other nodes can still validate those tokens using the gossiped public key.
A signing key can be explicitly revoked before its not_after deadline via DELETE /api/admin/keys/{kid}. This tombstones the OR-Map entry so that live_values() skips it, and the key is no longer served from /jwks. If the revoked key is the active kid, a warning is logged; a key rotation (POST /api/admin/keys/rotate) should follow immediately. Tombstones for revoked signing keys are GC-purged by the same hourly task that processes client and node tombstones.
Refresh token family lifecycle
RefreshFamilyState carries an expires_at unix timestamp (set when the family is
created from the max_refresh_token_age configuration). Two purge paths keep the CRDT
bounded:
-
In-memory purge (every gossip round):
IdpCrdt::purge_expired_families(now)callsrefresh_families.retain(|s| s.expires_at > now)before each outbound push and after each inbound merge. Expired families are not included in the next gossip message and do not accumulate across peers. -
Database purge (approximately hourly):
cleanup_expired_familiesdeletes rows fromcrdt_refresh_families WHERE expires_at < now. On startup,load_refresh_familiesalso filters out already-expired rows, so a crashed or restarted node does not re-inflate its CRDT from stale DB rows.
Refresh token revocation
Revoking a refresh token family (DELETE /api/admin/refresh-families/{family_id}) sets max_index = u64::MAX in the CRDT. Any node that receives this value via gossip will reject all future refresh tokens in that family, because every valid token_index is less than u64::MAX.
Partition behaviour
During a network partition, each node operates on its local CRDT snapshot. After the partition heals, the first gossip exchange merges the diverged states. For each CRDT type:
- OR-Map (signing keys): union of live entries; tombstones propagate — a key revoked on one side of the partition will suppress the corresponding entry on reconnect.
- LWW-Register (active_kid, wrapping_key_id): the highest timestamp wins; the losing side’s write is silently dropped. For
wrapping_key_id, the winning UUID triggers an on-demand pull of the actual key from the peer that set it. - OR-Map (clients, cluster_nodes): union of live entries; tombstones propagate on merge.
- LWW-Map (refresh_families): per-key, highest timestamp wins — a
max_indexset tou64::MAX(revocation) on one side of the partition propagates and invalidates any lower indexes issued during the partition. - LWW-Map (revoked_sessions): per-subject, highest timestamp wins — a revocation recorded on one side of the partition propagates and invalidates sessions whose
iatis before the winningrevoked_at. Only present whendistributed_mode >= eventual; inoffmode the field is populated but stays empty and is only checked node-locally. - LWW-Map (scope_definitions): per-scope-name, highest timestamp wins — a scope created or deleted on one side of the partition propagates on merge. Deletions (tombstones) set
is_deleted = truein the LWW value; the winning entry suppresses the scope from discovery and UserInfo claim resolution on all nodes. hbac_crdt::RuleSet(hbac_rules): each axis within a rule uses a security-conservative CRDT — RW-Set (remove-wins) for member sets and DW-Register (disable-wins) for category flags andmfa_bypass. Rule existence is an RW-Set ofRuleIds; deleting a rule on one side of the partition propagates and suppresses it on the other side after merge. A concurrent stale re-enable or re-add on the other side of the partition cannot widen access because disable-wins and remove-wins semantics apply.