Performance
This chapter covers issuance throughput and latency characteristics of Akāmu under load, with guidance on key type selection, connection pool tuning, and capacity planning.
All numbers were collected on a single host — Intel Core i7-12800H (14 cores /
20 threads, 63 GB RAM, Fedora Linux 6.15, OpenSSL 3.5.6) — using the
acme-bench tool in two modes:
- Process mode (
--spawn process): the server runs as a separate OS process with its own Tokio runtime, memory allocator, and SQLite:memory:database. This matches how a real deployment behaves. Heap allocation numbers reflect the client side only. - Inprocess mode (default): server and clients share a single process. This mode enables SQLite backend, connection pool, and read-only pool split benchmarks that require shared-process access to the database layer. Heap allocation numbers include both client and server.
Audit events are written to a JSONL file (/tmp/akamu-bench-audit.jsonl) in
both modes.
The benchmark runs full ACME workflows (new-order → authz → challenge validate
→ finalize → certificate download). Latency is end-to-end wall time from
new-order through certificate download; account creation is amortised and
excluded. Default configuration uses ec:P-256 client keys, ec:P-256 CA key,
and http-01 challenge.
The full benchmark suite can be run with
contrib/performance/run_benchmarks.sh, which writes newline-delimited JSON
results to a file for post-processing. Set SPAWN_MODE="--spawn process" to
run the suite in process mode.
Concurrency
With ec:P-256 certificates, http-01 validation, and SQLite :memory:,
throughput peaks at 5–10 concurrent clients in both modes and degrades
under higher concurrency as queue depth grows.
Process mode
| Clients | Throughput (iss/s) | Mean (ms) | p99 (ms) | new_order | authz | challenge | finalize | download |
|---|---|---|---|---|---|---|---|---|
| 1 | 100 | 10.2 | 13.7 | 1.3 | 1.0 | 4.3 | 3.4 | 0.4 |
| 5 | 975 | 5.0 | 6.3 | 0.6 | 0.4 | 2.6 | 1.2 | 0.2 |
| 10 | 1,098 | 8.7 | 14.7 | 1.6 | 0.6 | 4.0 | 2.2 | 0.3 |
| 25 | 1,208 | 19.0 | 24.3 | 4.6 | 0.5 | 8.2 | 5.4 | 0.3 |
| 50 | 1,015 | 34.5 | 73.7 | 9.2 | 0.5 | 17.7 | 6.9 | 0.2 |
Inprocess mode
| Clients | Throughput (iss/s) | Mean (ms) | p99 (ms) | new_order | authz | challenge | finalize | download |
|---|---|---|---|---|---|---|---|---|
| 1 | 120 | 8.3 | 13.1 | 1.1 | 0.8 | 3.7 | 2.4 | 0.3 |
| 5 | 818 | 6.1 | 8.0 | 0.6 | 0.5 | 2.8 | 1.8 | 0.3 |
| 10 | 854 | 11.5 | 13.9 | 1.2 | 1.1 | 4.3 | 4.1 | 0.8 |
| 25 | 889 | 27.5 | 36.3 | 3.2 | 3.2 | 8.1 | 11.1 | 1.9 |
| 50 | 681 | 67.0 | 80.5 | 7.5 | 7.0 | 20.1 | 26.6 | 5.7 |
Phase columns show mean milliseconds per ACME step.
Process mode peaks at c=5–10 (975–1,098 iss/s) with sub-9 ms mean
latency, driven by read-only pool separation, crypto caching, and
spawn_blocking for certificate signing. Inprocess mode peaks at c=5–25
(818–889 iss/s). Process mode shows lower download times (0.2 ms vs
1–6 ms) because certificate delivery bypasses the shared-process HTTP stack.
Inprocess mode shows higher authz and download overhead at high concurrency
due to Tokio task contention within the single runtime.
Client key type
The client key type is the largest single determinant of per-issuance latency. All runs use ec:P-256 CA; process mode uses 25 concurrent clients, inprocess mode uses 50.
Process mode
| CSR key type | Throughput (iss/s) | Mean (ms) | p99 (ms) | Finalize (ms) | Alloc/iss |
|---|---|---|---|---|---|
| ed25519 | 561 | 33.0 | 69.1 | 14.2 | 166 KB |
| ec:P-256 | 556 | 33.6 | 57.6 | 15.5 | 164 KB |
| ML-DSA-44 | 523 | 37.9 | 66.8 | 17.2 | 243 KB |
| ML-DSA-65 | 511 | 41.1 | 52.2 | 22.4 | 269 KB |
| ML-DSA-87 | 418 | 45.5 | 88.5 | 23.0 | 313 KB |
| ec:P-384 | 377 | 53.2 | 71.7 | 28.2 | 175 KB |
| rsa:2048 | 153 | 124.0 | 266.8 | 88.9 | 166 KB |
| rsa:4096 | 13 | 1156.6 | 2345.5 | 779.8 | 223 KB |
Inprocess mode
| CSR key type | Throughput (iss/s) | Mean (ms) | p99 (ms) | Finalize (ms) | Alloc/iss |
|---|---|---|---|---|---|
| ec:P-256 | 770 | 54.6 | 70.0 | 20.7 | 434 KB |
| ed25519 | 742 | 54.1 | 72.8 | 20.2 | 432 KB |
| ML-DSA-44 | 685 | 60.4 | 75.0 | 24.0 | 575 KB |
| ML-DSA-65 | 589 | 69.9 | 98.8 | 30.0 | 624 KB |
| ML-DSA-87 | 540 | 77.3 | 99.9 | 34.7 | 696 KB |
| ec:P-384 | 487 | 85.7 | 97.3 | 46.3 | 438 KB |
| rsa:2048 | 157 | 279.7 | 554.2 | 165.1 | 454 KB |
| rsa:4096 | 15 | 2506.9 | 4099.1 | 1496.2 | 531 KB |
In process mode ed25519 and ec:P-256 are effectively tied (~33 ms, 556–561 iss/s). ML-DSA variants perform well: ML-DSA-44 at 523 iss/s is only 6% slower than ec:P-256. EC P-384 is consistently slower than ML-DSA-87 in both modes due to its heavier finalize cost.
RSA 2048 is 3.6–4.7× slower than ec:P-256; RSA 4096 at ~1,160 ms mean is dominated entirely by key generation.
RSA 4096 is strongly discouraged for ACME clients in multi-client deployments.
RSA 4096 saturation
RSA 4096 key generation is CPU-wall-limited. Adding concurrency barely improves throughput while latency grows linearly.
Process mode
| Clients | Throughput (iss/s) | Mean (ms) | p99 (ms) | Finalize (ms) |
|---|---|---|---|---|
| 1 | 3 | 375 | 1,215 | 370 |
| 10 | 13 | 691 | 2,292 | 666 |
| 25 | 15 | 1,334 | 3,463 | 1,068 |
| 50 | 15 | 2,417 | 4,831 | 1,283 |
Inprocess mode
| Clients | Throughput (iss/s) | Mean (ms) | p99 (ms) | Finalize (ms) |
|---|---|---|---|---|
| 1 | 3 | 357 | 922 | 353 |
| 10 | 14 | 651 | 2,817 | 647 |
| 25 | 14 | 1,340 | 4,501 | 1,155 |
| 50 | 13 | 2,804 | 4,441 | 1,749 |
Throughput saturates at ~13–15 iss/s regardless of concurrency or mode. At c=50, p99 reaches 4.4–4.8 seconds. This is entirely client-side key generation; the server is idle waiting for CSRs.
CA key type
CA signing is server-side. The CA key type directly affects the finalize phase; other phases are unaffected. All runs use ec:P-256 client keys; process mode uses 25 concurrent clients, inprocess mode uses 50.
Process mode
| CA key | Throughput (iss/s) | Mean (ms) | p99 (ms) | Finalize (ms) |
|---|---|---|---|---|
| ec:P-256 | 624 | 31.0 | 48.5 | 14.9 |
| rsa:2048 | 466 | 42.7 | 61.4 | 25.5 |
| ec:P-384 | 307 | 66.3 | 95.9 | 38.5 |
| rsa:3072 | 266 | 76.0 | 94.1 | 49.2 |
| rsa:4096 | 183 | 116.5 | 165.8 | 89.9 |
Inprocess mode
| CA key | Throughput (iss/s) | Mean (ms) | p99 (ms) | Finalize (ms) |
|---|---|---|---|---|
| ec:P-256 | 729 | 58.3 | 68.4 | 22.8 |
| ec:P-384 | 663 | 63.1 | 75.0 | 25.7 |
| rsa:2048 | 643 | 64.5 | 82.7 | 26.9 |
| rsa:3072 | 598 | 69.3 | 81.5 | 31.7 |
| rsa:4096 | 518 | 80.4 | 99.6 | 38.5 |
EC P-256 is the fastest CA key type and the recommended default. In process mode, RSA 2048 CA (466 iss/s) outperforms EC P-384 CA (307 iss/s) because OpenSSL’s RSA 2048 signing is faster than ECDSA P-384; in inprocess mode RSA 2048 and EC P-384 are close (643 vs 663 iss/s). RSA 4096 as CA reduces throughput to 183–518 iss/s.
Post-quantum chain
Akāmu supports ML-DSA (FIPS 204 / RFC 9881) CA keys at three NIST security
levels. The table measures a full post-quantum chain (matching ML-DSA CA +
ML-DSA client keys, with --verify-cert) and compares to an ec:P-256
baseline. Process mode uses 25 concurrent clients, inprocess mode uses 50.
Process mode
| CA + client | NIST cat. | Throughput (iss/s) | Mean (ms) | p99 (ms) | Finalize (ms) | vs P-256 | Alloc/iss |
|---|---|---|---|---|---|---|---|
| ec:P-256 | — | 526 | 36.7 | 70.0 | 16.0 | — | 170 KB |
| ML-DSA-44 | 2 | 362 | 55.7 | 75.1 | 34.6 | +52% | 257 KB |
| ML-DSA-65 | 3 | 298 | 68.6 | 86.0 | 46.0 | +87% | 312 KB |
| ML-DSA-87 | 5 | 250 | 80.5 | 105.8 | 56.6 | +119% | 385 KB |
Inprocess mode
| CA + client | NIST cat. | Throughput (iss/s) | Mean (ms) | p99 (ms) | Finalize (ms) | vs P-256 | Alloc/iss |
|---|---|---|---|---|---|---|---|
| ec:P-256 | — | 730 | 56.0 | 68.2 | 21.9 | — | 438 KB |
| ML-DSA-44 | 2 | 637 | 64.1 | 84.3 | 28.4 | +14% | 671 KB |
| ML-DSA-65 | 3 | 551 | 75.6 | 89.5 | 32.3 | +35% | 770 KB |
| ML-DSA-87 | 5 | 508 | 83.8 | 96.9 | 40.3 | +50% | 914 KB |
ML-DSA-44 shows a smaller overhead in process mode (+52% vs +14% inprocess) because the server’s larger ML-DSA signature is generated out-of-process without competing for the client’s Tokio runtime. Allocation pressure in inprocess mode (671–914 KB) reflects both client and server heap usage; process mode (257–385 KB) reflects client-side only.
ML-DSA requires OpenSSL 3.5 or later. Akāmu will report a startup error if the requested key type is unavailable on the installed OpenSSL version.
Challenge type
All runs use ec:P-256 keys and SQLite :memory:; process mode uses 25
concurrent clients, inprocess mode uses 50.
Process mode
| Challenge | Throughput (iss/s) | Mean (ms) | p99 (ms) | Challenge phase (ms) |
|---|---|---|---|---|
| http-01 | 615 | 31.0 | 71.1 | 9.6 |
| dns-persist-01 | 544 | 37.7 | 62.9 | 14.7 |
Inprocess mode
| Challenge | Throughput (iss/s) | Mean (ms) | p99 (ms) | Challenge phase (ms) |
|---|---|---|---|---|
| http-01 | 746 | 57.3 | 65.3 | 17.7 |
| dns-persist-01 | 623 | 67.8 | 83.7 | 26.7 |
dns-persist-01 adds 5–9 ms to the challenge phase, reducing throughput by
12–16% in both modes. Both challenge types deliver zero errors across all
runs.
Backend comparison
SQLite :memory: versus a tmpfs-backed WAL file (/dev/shm), sweeping
concurrency with ec:P-256 keys and http-01. Inprocess mode only — process
mode always uses :memory:. The tmpfs backend uses a write coalescer that
batches concurrent writes through a single connection, eliminating BEGIN IMMEDIATE contention.
| Clients | :memory: (iss/s) | :memory: mean (ms) | tmpfs (iss/s) | tmpfs mean (ms) | Delta |
|---|---|---|---|---|---|
| 1 | 114 | 8.7 | 110 | 9.1 | −4% |
| 5 | 746 | 6.7 | 845 | 5.9 | +13% |
| 10 | 836 | 11.8 | 1,112 | 8.9 | +33% |
| 25 | 872 | 28.1 | 1,030 | 23.6 | +18% |
| 50 | 747 | 60.6 | 910 | 48.7 | +22% |
| 75 | 681 | 96.8 | 932 | 68.6 | +37% |
With the write coalescer, tmpfs WAL outperforms :memory: at c≥5: the
coalescer serialises writes on a dedicated connection, avoiding contention
that :memory: still experiences through the pool. Peak tmpfs throughput is
1,112 iss/s at c=10 versus 872 iss/s for :memory: at c=25. Tmpfs WAL is
the recommended backend for deployments that need crash-recoverable state
without the complexity of PostgreSQL.
Connection pool
Connection pool sizing affects throughput when multiple concurrent clients contend for database reads. The write coalescer handles all writes through a dedicated connection, so the pool primarily serves read operations. Inprocess mode with tmpfs WAL backend — process mode ignores pool settings.
| Pool | c=1 (iss/s) | c=5 (iss/s) | c=10 (iss/s) | c=25 (iss/s) | c=50 (iss/s) |
|---|---|---|---|---|---|
| 1 | 129 | 917 | 1,074 | 1,168 | 934 |
| 2 | 134 | 954 | 1,407 | 1,175 | 1,141 |
| 4 | 117 | 970 | 1,075 | 1,588 | 1,295 |
| 8 | 114 | 942 | 1,106 | 1,531 | 1,127 |
At c=1 pool size is irrelevant. At c=25, pool=4 delivers the best throughput (1,588 iss/s) — a 36% improvement over pool=1 (1,168 iss/s). Pool=4 is the recommended choice: it delivers the highest peak throughput while maintaining reasonable p99 latency (22.8 ms at c=25).
Pool sizes above 4 show diminishing returns; pool=8 at c=25 reaches 1,531 iss/s (−4% vs pool=4) with slightly higher p99 variance.
Read-only pool split
Splitting read-only handlers (get_order, get_authz, download_cert, star_cert,
renewal_info, ocsp) onto a separate ?mode=ro connection pool frees the
write pool for write-path handlers. Inprocess mode with tmpfs WAL — process
mode ignores pool settings.
| Clients | No split (iss/s) | Split ro=4 (iss/s) | Improvement |
|---|---|---|---|
| 1 | 113 | 110 | −3% |
| 5 | 894 | 990 | +11% |
| 10 | 1,253 | 1,154 | −8% |
| 25 | 1,197 | 1,548 | +29% |
| 50 | 1,009 | 1,430 | +42% |
The split delivers significant gains at c≥25 where read contention competes with the write coalescer. Peak improvement is +42% at c=50 (1,430 vs 1,009 iss/s). At lower concurrency the overhead of managing a separate pool can slightly reduce throughput.
RO connection sweep at c=10
| ro-connections | Throughput (iss/s) | Mean (ms) | p99 (ms) |
|---|---|---|---|
| 1 | 1,201 | 8.2 | 12.8 |
| 2 | 1,207 | 8.1 | 13.2 |
| 4 | 1,149 | 8.6 | 14.8 |
| 8 | 1,121 | 8.8 | 15.6 |
| 16 | 1,223 | 8.1 | 14.3 |
At c=10, all RO connection counts perform similarly (1,121–1,223 iss/s). ro=1 or ro=2 is the recommended setting for typical deployments; higher counts add connection overhead without meaningful throughput gain.
Key type recommendations
| Scenario | Recommended type |
|---|---|
| General purpose, broad client compatibility | ec:P-256 |
| Smallest footprint, fastest validation | ed25519 |
| Higher security margin, still classical | ec:P-384 |
| Post-quantum resistant, FIPS 204 category 2 | ml-dsa-44 |
| Post-quantum resistant, FIPS 204 category 3 | ml-dsa-65 |
| Post-quantum resistant, FIPS 204 category 5 | ml-dsa-87 |
| Interoperability with RSA-only clients | rsa:2048 (avoid RSA 4096 under load) |
Capacity planning
Single-node throughput for ec:P-256 keys, http-01:
| Target throughput | Configuration | Expected mean latency | Notes |
|---|---|---|---|
| ≤100 iss/s | 1 client, pool=1 | ~9 ms | Minimal deployment |
| ≤1,000 iss/s | 5–10 clients | 6–12 ms | Sweet spot: low latency, high throughput |
| ≤1,200 iss/s | 25 clients | ~20 ms | Near :memory: ceiling |
| ≤1,600 iss/s | 25 clients, pool=4, tmpfs WAL | ~14 ms | Coalescer + pool tuning |
Figures assume ec:P-256 keys and http-01 challenge. RSA or ML-DSA keys lower throughput proportionally.
For the database backend: SQLite :memory: suits nodes with no persistent
state requirement (accounts, orders, and certificates are lost on restart).
Tmpfs WAL (/dev/shm) with the write coalescer outperforms :memory: under
concurrency (up to 1,588 iss/s vs ~889 iss/s) and provides crash-recoverable
state. For persistent deployments, PostgreSQL is recommended; use a
connection pool of 20–25 ([database] pool_connections = 25).
Memory
The benchmark instruments heap allocation using a custom GlobalAlloc wrapper.
Per-issuance allocation pressure — bytes requested from the system allocator
per certificate, including memory subsequently freed — varies by configuration
and mode.
In process mode, allocation reflects the client side only (server runs in a separate process); in inprocess mode it includes both client and server.
Process mode (client-side allocation)
| Configuration | Per-issuance alloc |
|---|---|
| ec:P-256 CA + ec:P-256 client, c=1 | 134 KB |
| ec:P-256 CA + ec:P-256 client, c=5 | 134 KB |
| ec:P-256 CA + ec:P-256 client, c=10 | 137 KB |
| ec:P-256 CA + ec:P-256 client, c=50 | 190 KB |
| ec:P-256 CA + rsa:4096 client, c=25 | 223 KB |
| ML-DSA-44 CA + ML-DSA-44 client, c=25 | 257 KB |
| ML-DSA-65 CA + ML-DSA-65 client, c=25 | 312 KB |
| ML-DSA-87 CA + ML-DSA-87 client, c=25 | 385 KB |
Inprocess mode (client + server allocation)
| Configuration | Per-issuance alloc |
|---|---|
| ec:P-256 CA + ec:P-256 client, c=1 | 416 KB |
| ec:P-256 CA + ec:P-256 client, c=5 | 416 KB |
| ec:P-256 CA + ec:P-256 client, c=10 | 417 KB |
| ec:P-256 CA + ec:P-256 client, c=50 | 426 KB |
| ec:P-256 CA + rsa:4096 client, c=50 | 531 KB |
| ML-DSA-44 CA + ML-DSA-44 client, c=50 | 671 KB |
| ML-DSA-65 CA + ML-DSA-65 client, c=50 | 770 KB |
| ML-DSA-87 CA + ML-DSA-87 client, c=50 | 914 KB |
The difference between modes (e.g. 416 KB − 134 KB = 282 KB for ec:P-256) represents the server-side allocation per issuance: certificate construction, DER encoding, audit logging, and database writes.
JSON output
The "memory" key is present when --output json is used:
{
"memory": {
"start_live_bytes": 102400,
"server_ready_live_bytes": 204800,
"after_bench_live_bytes": 614400,
"peak_live_bytes": 1572864,
"server_overhead_bytes": 512000,
"issuance_growth_bytes": 409600,
"per_issuance_growth_bytes": 1900,
"issuance_alloc_bytes": 87523328,
"per_issuance_alloc_bytes": 150120,
"total_alloc_count": 319099
}
}
| Field | Meaning |
|---|---|
*_live_bytes | Heap footprint at each milestone |
peak_live_bytes | Highest live bytes seen during the issuance window |
server_overhead_bytes | Live growth from start to server-ready |
issuance_growth_bytes | Live growth from server-ready to end of bench |
per_issuance_growth_bytes | Per-issuance share of issuance growth |
issuance_alloc_bytes | Total bytes requested during the issuance window |
per_issuance_alloc_bytes | Per-issuance allocation pressure |
total_alloc_count | Total number of alloc calls in the whole process |
Running the benchmark
Full suite
The benchmark suite script runs all configurations and writes newline-delimited JSON results:
cargo build --release
# Inprocess mode (default)
contrib/performance/run_benchmarks.sh [OUTPUT_FILE]
# Process mode
SPAWN_MODE="--spawn process" contrib/performance/run_benchmarks.sh [OUTPUT_FILE]
Post-processing examples:
# Print throughput for all runs
jq -r '.label + ": " + (.summary.throughput_per_sec|round|tostring) + " iss/s"' results.ndjson
# Extract concurrency scaling table
jq 'select(.label | startswith("concurrency_"))
| [.label, .summary.throughput_per_sec,
.summary.total_latency_ms.mean, .summary.total_latency_ms.p95]' results.ndjson
Individual runs
cargo build --release
# Concurrency sweep (process mode)
for c in 1 5 10 25 50; do
cargo bench --bench acme_bench -- --spawn process --clients $c --requests 300 --warmup 20
done
# Key type comparison at c=25
for kt in ec:P-256 ec:P-384 ed25519 rsa:2048 ml-dsa-44; do
cargo bench --bench acme_bench -- --spawn process --clients 25 --key-type $kt --requests 100
done
# CA key type comparison
for cakt in ec:P-256 ec:P-384 rsa:2048 rsa:4096; do
cargo bench --bench acme_bench -- --spawn process --clients 25 --ca-key-type $cakt --requests 100
done
# Post-quantum full chain with verification
cargo bench --bench acme_bench -- \
--spawn process --clients 25 --ca-key-type ml-dsa-44 --key-type ml-dsa-44 --verify-cert
# Challenge type comparison
cargo bench --bench acme_bench -- --spawn process --clients 25 --challenge dns-persist-01
# Backend comparison (inprocess mode, tmpfs WAL)
cargo bench --bench acme_bench -- --clients 10 --db "sqlite:///dev/shm/bench.db" --requests 300
# RO pool split (inprocess mode)
cargo bench --bench acme_bench -- \
--clients 10 --db "sqlite:///dev/shm/bench.db" --ro-connections 4 --requests 300
# JSON output for scripting
cargo bench --bench acme_bench -- --spawn process --clients 25 --requests 100 --output json | jq .summary
Available options
| Option | Default | Description |
|---|---|---|
--spawn MODE | inprocess | inprocess or process; process starts separate OS processes |
--nodes N | 1 | Number of akamu nodes in the cluster |
--clients N | 10 | Concurrent worker tasks |
--requests N | 100 | Issuances to measure (warmup not counted) |
--warmup N | 10 | Warmup issuances discarded before measurement |
--challenge TYPE | http-01 | http-01 or dns-persist-01 |
--key-type TYPE | ec:P-256 | CSR key type (see table above) |
--ca-key-type TYPE | ec:P-256 | CA key type (same syntax) |
--topology MODE | direct | direct (round-robin) or proxy (single-node proxy) |
--no-gossip | off | Disable gossip in multi-node runs |
--db PATH | :memory: | SQLite URL or PostgreSQL connection string |
--pool-connections N | 1 | Write connection pool size |
--ro-connections N | 0 | Read-only connection pool size (0 = no split) |
--wildcard | off | Issue *.bench-N.acme-bench.test (dns-persist-01 only) |
--output FORMAT | text | text or json |
--verify-cert | off | Parse and verify the SAN of every issued certificate |
--poll-ms N | 100 | Challenge poll interval in milliseconds |