Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance

This chapter covers issuance throughput and latency characteristics of Akāmu under load, with guidance on key type selection and capacity planning.

All numbers were collected on a single host using the acme-bench tool shipped in the repository:

cargo bench --bench acme_bench -- [OPTIONS]

The benchmark runs full ACME flows (account → new-order → challenge validate → finalize → certificate download) against a real in-process server over the loopback interface using an in-memory SQLite database. Reported latency is end-to-end wall time from the start of new-order through certificate download; account creation is excluded because it is amortised across all orders from a given client.

Note — database layer (sqlx). Akāmu uses sqlx 0.8 for SQLite access. Both in-memory (:memory:) and file-backed databases use a single-connection pool (max_connections = 1). In-memory databases require this because every SQLite in-memory connection opens its own private, empty database. File-backed databases use it to avoid SQLITE_BUSY_SNAPSHOT (error 517), a WAL-mode contention error that bypasses the busy handler and cannot be retried — sqlx attempts to reuse connection read-snapshots across pool round-trips, and when a concurrent writer commits between those round-trips the stale snapshot triggers the error. Approximately 24 SQL round-trips are needed per issuance (reduced from ~55 by moving anti-replay nonces to an in-memory store and using JOIN queries to collapse read pairs into single round-trips); throughput plateaus at ≈ 1350–1500 iss/s at 25 concurrent clients, determined by how fast the single sqlx connection can process queries rather than by crypto, network, or storage speed. See the Database scalability section for guidance on exceeding this ceiling.


Concurrency scaling

With EC P-256 certificates and an EC P-256 CA, throughput scales up to ~10 concurrent clients and then plateaus as the single-connection pool becomes the bottleneck:

Concurrent clientsThroughput (iss/s)Mean latency (ms)p95 (ms)
12314.35.3
510864.65.6
1014456.87.7
25150115.217.7
50135630.834.7

Throughput peaks around 10–25 concurrent clients at ~1500 iss/s and remains stable at 25–50 clients (≈ 1350–1500 iss/s); latency grows roughly linearly with client count, consistent with a single serialised resource. The practical bottleneck is the in-memory SQLite single connection; crypto and network are not limiting factors at these rates.


Key type comparison

The table below compares issuance performance for different CSR key types at 25 concurrent clients with an EC P-256 CA.

CSR key typeThroughput (iss/s)Mean latency (ms)p95 (ms)Finalize phase (ms)
ec:P-256141915.017.03.4
ed25519126515.917.23.7
ec:P-384131116.319.45.2
ml-dsa-44133216.218.84.7
ml-dsa-65114918.221.25.7
ml-dsa-87121617.720.95.8
rsa:2048142137.6246.3104.0
rsa:409614993915

All classical and post-quantum key types cluster around 1150–1420 iss/s because throughput is bounded by the single-connection database pool, not by crypto. Finalize-phase latency (CSR verification + certificate issuance) still reflects relative signing cost: EC and Ed25519 are fastest, ML-DSA adds ~1–2 ms, and RSA adds tens to hundreds of milliseconds.

RSA is the outlier: RSA 2048 adds ~100 ms to finalize, and RSA 4096 adds ~900 ms.

RSA 4096 saturation

ClientsThroughput (iss/s)Finalize mean (ms)p99 (ms)
13353974
10124841188
25149152021
50911183271

Throughput is limited by RSA 4096 key generation time. At 50 clients the additional queuing raises both finalize latency and overall contention, reducing aggregate throughput below the 25-client figure. Avoid RSA 4096 in any configuration where more than a handful of concurrent ACME clients are expected.


Post-quantum cryptography

Akāmu supports ML-DSA (FIPS 204 / RFC 9881) for both CA keys and certificate keys. Three security levels are available. The table uses a full post-quantum chain (ML-DSA CA + ML-DSA leaf, with --verify-cert) at 25 concurrent clients:

Parameter setNIST categoryThroughput (iss/s)Alloc pressure (MiB/iss)
ML-DSA-44213660.54
ML-DSA-65310930.64
ML-DSA-87510670.77
EC P-25613650.33

ML-DSA allocation pressure is 60–130% higher than EC P-256 per issuance, reflecting the larger key and signature structures. Throughput difference between ML-DSA and EC P-256 varies by parameter set: ML-DSA-44 matches EC P-256 closely (1366 vs 1365 iss/s) because the database single-connection bottleneck dominates over crypto cost at 25 clients. ML-DSA-65 and ML-DSA-87 trail by ~20–25% due to their larger certificate structures consuming more of the single connection’s capacity during signing and serialisation.

ML-DSA requires OpenSSL 3.5 or later. Akāmu will report a startup error if the requested key type is unavailable on the installed OpenSSL version.

CA key type impact

CA keyThroughput (iss/s)Mean latency (ms)Finalize (ms)
ec:P-256125516.33.7
ec:P-384133314.73.3
rsa:2048129315.54.0
rsa:409683223.811.4

EC and RSA 2048 CA keys deliver equivalent throughput in the optimised server (all are database-bottlenecked at 25 clients). RSA 4096 as the CA key reduces throughput by ~35% vs EC P-256 due to slower signing raising finalize latency above the per-query DB round-trip time; avoid it for performance-sensitive deployments.


Challenge type comparison

Challenge typeThroughput (iss/s)Challenge phase (ms)Alloc pressure (MiB/iss)
http-0114565.50.33
dns-persist-0112526.80.37

http-01 delivers ~15% higher throughput than dns-persist-01 on loopback. Both challenge phases reflect the adaptive poll backoff (starts at 1 ms, caps at --poll-ms) rather than network latency; the 5–7 ms figure is dominated by polling overhead and background validation round-trips.


Key type recommendations

ScenarioRecommended key type
General purpose, broad client compatibilityec:P-256
Smallest footprint, fastest validationed25519
Higher security margin, still classicalec:P-384
Post-quantum resistant, FIPS 204 category 2ml-dsa-44
Post-quantum resistant, FIPS 204 category 3ml-dsa-65
Post-quantum resistant, FIPS 204 category 5ml-dsa-87
Interoperability with RSA-only clientsrsa:2048 (avoid RSA 4096 under load)

Database scalability

Both in-memory (:memory:) and file-backed databases use a single-connection pool, so the throughput ceiling of ≈ 1350–1500 iss/s applies to both. The ceiling is set by how fast the sqlx SQLite worker thread can process one query at a time — each query requires a channel round-trip to the background thread, and ~24 such round-trips are needed per issuance (reduced from ~55 by moving anti-replay nonces to an in-memory store and using JOIN queries to collapse read pairs).

Backend comparison (tmpfs vs in-memory)

The table below shows that file-backed SQLite on a RAM-backed filesystem (tmpfs / /dev/shm) produces equivalent throughput to an in-memory database. WAL journal mode adds a small amount of write bookkeeping overhead; the difference is within run-to-run noise.

Concurrent clientsIn-memory (iss/s)tmpfs WAL (iss/s)
1231220
510861108
1014451380
2515011321
5013561250

Both backends plateau around 1350–1500 iss/s at 10–25 concurrent clients. The bottleneck is the sqlx connection round-trip per query, not storage speed; switching from in-memory to a tmpfs-backed file provides durability without a throughput penalty.

For sustained high-throughput targets consider:

  • In-memory database for lab, CI, or ephemeral CA use cases. Fastest startup; data is lost on restart.
  • File-backed WAL database on a fast SSD or RAM-backed filesystem. Throughput matches in-memory while providing crash durability.
  • Sharding — multiple Akāmu instances behind a load balancer, each with its own database — for production-scale deployments requiring higher aggregate issuance rates above the ≈ 1500 iss/s per-instance ceiling.

Connection pool size and BEGIN IMMEDIATE

SQLITE_BUSY_SNAPSHOT (error 517) occurs in WAL mode when a deferred transaction (BEGIN) captures a read snapshot that becomes stale after another connection commits — even when the two transactions write to completely different rows. Unlike SQLITE_BUSY (error 5), error 517 bypasses the busy handler entirely, so busy_timeout has no effect on it.

Akāmu resolves this by using BEGIN IMMEDIATE for every write transaction (db::begin_write). BEGIN IMMEDIATE acquires the write lock at transaction start, so the snapshot is always current. Any resulting SQLITE_BUSY contention is handled transparently by the busy_timeout = 5 s already configured on the pool.

The table below shows that after this fix, pool > 1 produces zero errors at every concurrency level. Write throughput is unchanged because BEGIN IMMEDIATE still serialises writers — only one connection can hold the write lock at a time — but errors are eliminated.

Throughput (iss/s) and error count (out of 200 requests) on tmpfs WAL with BEGIN IMMEDIATE:

Concurrent clientsPool = 1Pool = 2Pool = 4Pool = 8
1208 / 0 err206 / 0 err202 / 0 err220 / 0 err
51092 / 0 err959 / 0 err707 / 0 err617 / 0 err
101389 / 0 err1372 / 0 err1247 / 0 err821 / 0 err
251307 / 0 err1164 / 0 err1096 / 0 err998 / 0 err
501197 / 0 err1119 / 0 err1041 / 0 err996 / 0 err

All pool sizes produce zero errorsBEGIN IMMEDIATE eliminates SQLITE_BUSY_SNAPSHOT regardless of how many connections are in the pool. Pool = 1 consistently delivers the highest throughput because all requests share a single serialised connection channel with no lock-acquisition contention. Pool = 2 and above pay increasingly for BEGIN IMMEDIATE wait time as multiple connections compete for the WAL write lock; the gap widens at medium concurrency (5–10 clients) where lock contention is highest relative to available parallelism.

For the single-connection production default (open) this has no observable effect: with one connection there is never a concurrent writer, so BEGIN IMMEDIATE and BEGIN DEFERRED behave identically.

The --pool-connections benchmark option can be used to measure pool behaviour:

# Pool comparison on tmpfs with BEGIN IMMEDIATE (zero errors expected)
for p in 1 2 4 8; do
  DB=$(mktemp /dev/shm/bench_pool_XXXXXX.db)
  cargo bench --bench acme_bench -- \
    --db "$DB" --pool-connections "$p" \
    --clients 25 --requests 200 --warmup 20 --poll-ms 5
  rm -f "$DB" "${DB}-wal" "${DB}-shm"
done

Running the benchmark

The acme-bench binary is built as a Cargo bench target:

cargo bench --bench acme_bench -- --help

Common invocations:

# Baseline: 25 concurrent clients, 200 issuances, EC P-256, 5 ms poll cap
cargo bench --bench acme_bench -- --clients 25 --requests 200 --warmup 20 --poll-ms 5

# Compare RSA 2048 vs EC P-256
cargo bench --bench acme_bench -- --key-type rsa:2048 --clients 25 --requests 100
cargo bench --bench acme_bench -- --key-type ec:P-256  --clients 25 --requests 100

# Full post-quantum chain (ML-DSA-65 CA + ML-DSA-65 leaf)
cargo bench --bench acme_bench -- \
  --ca-key-type ml-dsa-65 --key-type ml-dsa-65 \
  --clients 25 --requests 100 --verify-cert

# Scalability sweep
for n in 1 5 10 25 50; do
  cargo bench --bench acme_bench -- --clients $n --requests 300 --warmup 20 --poll-ms 5
done

# dns-persist-01 challenge type
cargo bench --bench acme_bench -- --challenge dns-persist-01 --clients 25 --requests 200

# JSON output for scripting
cargo bench --bench acme_bench -- --output json --clients 25 --requests 200 --poll-ms 5 | jq .summary

Available options

OptionDefaultDescription
--clients N10Concurrent worker tasks
--requests N100Issuances to measure (warmup not counted)
--warmup N10Warmup issuances discarded before measurement
--poll-ms N50Poll interval cap in milliseconds; adaptive backoff starts at 1 ms
--challenge TYPEhttp-01http-01 or dns-persist-01
--key-type TYPEec:P-256CSR key type (see table above)
--ca-key-type TYPEec:P-256CA key type (same syntax)
--db PATH:memory:SQLite path — :memory: or a file path
--pool-connections N1SQLite pool size; ignored (clamped to 1) when --db :memory:; see Connection pool size
--wildcardoffIssue *.bench-N.acme-bench.test (dns-persist-01 only)
--output FORMATtexttext or json
--verify-certoffParse and verify the SAN of every issued certificate

The poll loop uses adaptive exponential backoff: it starts at 1 ms, doubles each miss, and caps at --poll-ms. This mirrors how production ACME clients behave and reveals the true validation latency without a fixed artificial floor.


Memory consumption

The benchmark instruments heap allocation using a custom GlobalAlloc wrapper that records four AtomicU64 counters. This reports in-process heap usage without any external tooling or /proc parsing.

Three snapshots are taken:

MilestoneWhen
process startBefore the server is initialised
server readyAfter the server has bound its port and is accepting connections
after benchAfter all issuances (warmup + measured) have completed

The peak counter is reset at server ready so the high-water mark reflects only the issuance window, not server startup allocations.

Text output

  Heap (allocator counters):
    process start:        0.1 MiB  live
    server ready:         0.2 MiB  live   (server overhead: +0.5 MiB)
    after  220 iss.:      0.6 MiB  live   (issuance growth: +0.4 MiB, 1.9 KiB/iss.)
    peak live:            1.5 MiB         (high-water mark during issuances)
    alloc pressure:      83.5 MiB  total  (0.379 MiB/iss. requested, incl. freed)

live — bytes currently held on the heap (footprint). alloc pressure — cumulative bytes requested from the system allocator since server ready, including memory that was allocated and subsequently freed. A high pressure-to-footprint ratio indicates short-lived allocations (normal for per-request work like signature buffers and JSON serialisation).

JSON output

The "memory" key is present in JSON output when --output json is used:

{
  "memory": {
    "start_live_bytes":           102400,
    "server_ready_live_bytes":    204800,
    "after_bench_live_bytes":     614400,
    "peak_live_bytes":           1572864,
    "server_overhead_bytes":      512000,
    "issuance_growth_bytes":      409600,
    "per_issuance_growth_bytes":    1900,
    "issuance_alloc_bytes":     87523328,
    "per_issuance_alloc_bytes":   397833,
    "total_alloc_count":         700000
  }
}
FieldMeaning
*_live_bytesHeap footprint at each milestone
peak_live_bytesHighest live bytes seen during the issuance window
server_overhead_bytesLive growth from start to server-ready
issuance_growth_bytesLive growth from server-ready to end of bench
per_issuance_growth_bytesPer-issuance share of issuance growth
issuance_alloc_bytesTotal bytes requested during the issuance window
per_issuance_alloc_bytesPer-issuance allocation pressure
total_alloc_countTotal number of alloc calls in the whole process

Typical figures

At 25 concurrent clients with 200 measured issuances (EC P-256, :memory: DB, 5 ms poll cap):

  • Server overhead: ~0.3 MiB live (router tables, DB connection pool, CA state, HTTP client)
  • Per-issuance heap growth: ~1 KiB (request-scoped state retained by tokio workers)
  • Peak during issuances: ~2.7 MiB (25 in-flight requests simultaneously)
  • Allocation pressure: ~335 KiB per issuance (JWS buffers, JSON serialisation, cert DER/PEM)

For ML-DSA key types allocation pressure rises to ~550–790 KiB per issuance due to larger key and certificate structures (lower end with EC P-256 CA, higher end with a matching ML-DSA CA).

These figures confirm that Akāmu has a stable heap footprint at steady state. Per-issuance live growth is small and bounded by the number of concurrent workers, not the total number of issuances.