Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance

This chapter covers issuance throughput and latency characteristics of Akāmu under load, with guidance on key type selection, connection pool tuning, and capacity planning.

All numbers were collected on a single host — Intel Core i7-12800H (14 cores / 20 threads, 63 GB RAM, Fedora Linux 6.15, OpenSSL 3.5.6) — using the acme-bench tool in two modes:

  • Process mode (--spawn process): the server runs as a separate OS process with its own Tokio runtime, memory allocator, and SQLite :memory: database. This matches how a real deployment behaves. Heap allocation numbers reflect the client side only.
  • Inprocess mode (default): server and clients share a single process. This mode enables SQLite backend, connection pool, and read-only pool split benchmarks that require shared-process access to the database layer. Heap allocation numbers include both client and server.

Audit events are written to a JSONL file (/tmp/akamu-bench-audit.jsonl) in both modes.

The benchmark runs full ACME workflows (new-order → authz → challenge validate → finalize → certificate download). Latency is end-to-end wall time from new-order through certificate download; account creation is amortised and excluded. Default configuration uses ec:P-256 client keys, ec:P-256 CA key, and http-01 challenge.

The full benchmark suite can be run with contrib/performance/run_benchmarks.sh, which writes newline-delimited JSON results to a file for post-processing. Set SPAWN_MODE="--spawn process" to run the suite in process mode.


Concurrency

With ec:P-256 certificates, http-01 validation, and SQLite :memory:, throughput peaks at 5–10 concurrent clients in both modes and degrades under higher concurrency as queue depth grows.

Process mode

ClientsThroughput (iss/s)Mean (ms)p99 (ms)new_orderauthzchallengefinalizedownload
110010.213.71.31.04.33.40.4
59755.06.30.60.42.61.20.2
101,0988.714.71.60.64.02.20.3
251,20819.024.34.60.58.25.40.3
501,01534.573.79.20.517.76.90.2

Inprocess mode

ClientsThroughput (iss/s)Mean (ms)p99 (ms)new_orderauthzchallengefinalizedownload
11208.313.11.10.83.72.40.3
58186.18.00.60.52.81.80.3
1085411.513.91.21.14.34.10.8
2588927.536.33.23.28.111.11.9
5068167.080.57.57.020.126.65.7

Phase columns show mean milliseconds per ACME step.

Process mode peaks at c=5–10 (975–1,098 iss/s) with sub-9 ms mean latency, driven by read-only pool separation, crypto caching, and spawn_blocking for certificate signing. Inprocess mode peaks at c=5–25 (818–889 iss/s). Process mode shows lower download times (0.2 ms vs 1–6 ms) because certificate delivery bypasses the shared-process HTTP stack. Inprocess mode shows higher authz and download overhead at high concurrency due to Tokio task contention within the single runtime.


Client key type

The client key type is the largest single determinant of per-issuance latency. All runs use ec:P-256 CA; process mode uses 25 concurrent clients, inprocess mode uses 50.

Process mode

CSR key typeThroughput (iss/s)Mean (ms)p99 (ms)Finalize (ms)Alloc/iss
ed2551956133.069.114.2166 KB
ec:P-25655633.657.615.5164 KB
ML-DSA-4452337.966.817.2243 KB
ML-DSA-6551141.152.222.4269 KB
ML-DSA-8741845.588.523.0313 KB
ec:P-38437753.271.728.2175 KB
rsa:2048153124.0266.888.9166 KB
rsa:4096131156.62345.5779.8223 KB

Inprocess mode

CSR key typeThroughput (iss/s)Mean (ms)p99 (ms)Finalize (ms)Alloc/iss
ec:P-25677054.670.020.7434 KB
ed2551974254.172.820.2432 KB
ML-DSA-4468560.475.024.0575 KB
ML-DSA-6558969.998.830.0624 KB
ML-DSA-8754077.399.934.7696 KB
ec:P-38448785.797.346.3438 KB
rsa:2048157279.7554.2165.1454 KB
rsa:4096152506.94099.11496.2531 KB

In process mode ed25519 and ec:P-256 are effectively tied (~33 ms, 556–561 iss/s). ML-DSA variants perform well: ML-DSA-44 at 523 iss/s is only 6% slower than ec:P-256. EC P-384 is consistently slower than ML-DSA-87 in both modes due to its heavier finalize cost.

RSA 2048 is 3.6–4.7× slower than ec:P-256; RSA 4096 at ~1,160 ms mean is dominated entirely by key generation.

RSA 4096 is strongly discouraged for ACME clients in multi-client deployments.


RSA 4096 saturation

RSA 4096 key generation is CPU-wall-limited. Adding concurrency barely improves throughput while latency grows linearly.

Process mode

ClientsThroughput (iss/s)Mean (ms)p99 (ms)Finalize (ms)
133751,215370
10136912,292666
25151,3343,4631,068
50152,4174,8311,283

Inprocess mode

ClientsThroughput (iss/s)Mean (ms)p99 (ms)Finalize (ms)
13357922353
10146512,817647
25141,3404,5011,155
50132,8044,4411,749

Throughput saturates at ~13–15 iss/s regardless of concurrency or mode. At c=50, p99 reaches 4.4–4.8 seconds. This is entirely client-side key generation; the server is idle waiting for CSRs.


CA key type

CA signing is server-side. The CA key type directly affects the finalize phase; other phases are unaffected. All runs use ec:P-256 client keys; process mode uses 25 concurrent clients, inprocess mode uses 50.

Process mode

CA keyThroughput (iss/s)Mean (ms)p99 (ms)Finalize (ms)
ec:P-25662431.048.514.9
rsa:204846642.761.425.5
ec:P-38430766.395.938.5
rsa:307226676.094.149.2
rsa:4096183116.5165.889.9

Inprocess mode

CA keyThroughput (iss/s)Mean (ms)p99 (ms)Finalize (ms)
ec:P-25672958.368.422.8
ec:P-38466363.175.025.7
rsa:204864364.582.726.9
rsa:307259869.381.531.7
rsa:409651880.499.638.5

EC P-256 is the fastest CA key type and the recommended default. In process mode, RSA 2048 CA (466 iss/s) outperforms EC P-384 CA (307 iss/s) because OpenSSL’s RSA 2048 signing is faster than ECDSA P-384; in inprocess mode RSA 2048 and EC P-384 are close (643 vs 663 iss/s). RSA 4096 as CA reduces throughput to 183–518 iss/s.


Post-quantum chain

Akāmu supports ML-DSA (FIPS 204 / RFC 9881) CA keys at three NIST security levels. The table measures a full post-quantum chain (matching ML-DSA CA + ML-DSA client keys, with --verify-cert) and compares to an ec:P-256 baseline. Process mode uses 25 concurrent clients, inprocess mode uses 50.

Process mode

CA + clientNIST cat.Throughput (iss/s)Mean (ms)p99 (ms)Finalize (ms)vs P-256Alloc/iss
ec:P-25652636.770.016.0170 KB
ML-DSA-44236255.775.134.6+52%257 KB
ML-DSA-65329868.686.046.0+87%312 KB
ML-DSA-87525080.5105.856.6+119%385 KB

Inprocess mode

CA + clientNIST cat.Throughput (iss/s)Mean (ms)p99 (ms)Finalize (ms)vs P-256Alloc/iss
ec:P-25673056.068.221.9438 KB
ML-DSA-44263764.184.328.4+14%671 KB
ML-DSA-65355175.689.532.3+35%770 KB
ML-DSA-87550883.896.940.3+50%914 KB

ML-DSA-44 shows a smaller overhead in process mode (+52% vs +14% inprocess) because the server’s larger ML-DSA signature is generated out-of-process without competing for the client’s Tokio runtime. Allocation pressure in inprocess mode (671–914 KB) reflects both client and server heap usage; process mode (257–385 KB) reflects client-side only.

ML-DSA requires OpenSSL 3.5 or later. Akāmu will report a startup error if the requested key type is unavailable on the installed OpenSSL version.


Challenge type

All runs use ec:P-256 keys and SQLite :memory:; process mode uses 25 concurrent clients, inprocess mode uses 50.

Process mode

ChallengeThroughput (iss/s)Mean (ms)p99 (ms)Challenge phase (ms)
http-0161531.071.19.6
dns-persist-0154437.762.914.7

Inprocess mode

ChallengeThroughput (iss/s)Mean (ms)p99 (ms)Challenge phase (ms)
http-0174657.365.317.7
dns-persist-0162367.883.726.7

dns-persist-01 adds 5–9 ms to the challenge phase, reducing throughput by 12–16% in both modes. Both challenge types deliver zero errors across all runs.


Backend comparison

SQLite :memory: versus a tmpfs-backed WAL file (/dev/shm), sweeping concurrency with ec:P-256 keys and http-01. Inprocess mode only — process mode always uses :memory:. The tmpfs backend uses a write coalescer that batches concurrent writes through a single connection, eliminating BEGIN IMMEDIATE contention.

Clients:memory: (iss/s):memory: mean (ms)tmpfs (iss/s)tmpfs mean (ms)Delta
11148.71109.1−4%
57466.78455.9+13%
1083611.81,1128.9+33%
2587228.11,03023.6+18%
5074760.691048.7+22%
7568196.893268.6+37%

With the write coalescer, tmpfs WAL outperforms :memory: at c≥5: the coalescer serialises writes on a dedicated connection, avoiding contention that :memory: still experiences through the pool. Peak tmpfs throughput is 1,112 iss/s at c=10 versus 872 iss/s for :memory: at c=25. Tmpfs WAL is the recommended backend for deployments that need crash-recoverable state without the complexity of PostgreSQL.


Connection pool

Connection pool sizing affects throughput when multiple concurrent clients contend for database reads. The write coalescer handles all writes through a dedicated connection, so the pool primarily serves read operations. Inprocess mode with tmpfs WAL backend — process mode ignores pool settings.

Poolc=1 (iss/s)c=5 (iss/s)c=10 (iss/s)c=25 (iss/s)c=50 (iss/s)
11299171,0741,168934
21349541,4071,1751,141
41179701,0751,5881,295
81149421,1061,5311,127

At c=1 pool size is irrelevant. At c=25, pool=4 delivers the best throughput (1,588 iss/s) — a 36% improvement over pool=1 (1,168 iss/s). Pool=4 is the recommended choice: it delivers the highest peak throughput while maintaining reasonable p99 latency (22.8 ms at c=25).

Pool sizes above 4 show diminishing returns; pool=8 at c=25 reaches 1,531 iss/s (−4% vs pool=4) with slightly higher p99 variance.


Read-only pool split

Splitting read-only handlers (get_order, get_authz, download_cert, star_cert, renewal_info, ocsp) onto a separate ?mode=ro connection pool frees the write pool for write-path handlers. Inprocess mode with tmpfs WAL — process mode ignores pool settings.

ClientsNo split (iss/s)Split ro=4 (iss/s)Improvement
1113110−3%
5894990+11%
101,2531,154−8%
251,1971,548+29%
501,0091,430+42%

The split delivers significant gains at c≥25 where read contention competes with the write coalescer. Peak improvement is +42% at c=50 (1,430 vs 1,009 iss/s). At lower concurrency the overhead of managing a separate pool can slightly reduce throughput.

RO connection sweep at c=10

ro-connectionsThroughput (iss/s)Mean (ms)p99 (ms)
11,2018.212.8
21,2078.113.2
41,1498.614.8
81,1218.815.6
161,2238.114.3

At c=10, all RO connection counts perform similarly (1,121–1,223 iss/s). ro=1 or ro=2 is the recommended setting for typical deployments; higher counts add connection overhead without meaningful throughput gain.


Key type recommendations

ScenarioRecommended type
General purpose, broad client compatibilityec:P-256
Smallest footprint, fastest validationed25519
Higher security margin, still classicalec:P-384
Post-quantum resistant, FIPS 204 category 2ml-dsa-44
Post-quantum resistant, FIPS 204 category 3ml-dsa-65
Post-quantum resistant, FIPS 204 category 5ml-dsa-87
Interoperability with RSA-only clientsrsa:2048 (avoid RSA 4096 under load)

Capacity planning

Single-node throughput for ec:P-256 keys, http-01:

Target throughputConfigurationExpected mean latencyNotes
≤100 iss/s1 client, pool=1~9 msMinimal deployment
≤1,000 iss/s5–10 clients6–12 msSweet spot: low latency, high throughput
≤1,200 iss/s25 clients~20 msNear :memory: ceiling
≤1,600 iss/s25 clients, pool=4, tmpfs WAL~14 msCoalescer + pool tuning

Figures assume ec:P-256 keys and http-01 challenge. RSA or ML-DSA keys lower throughput proportionally.

For the database backend: SQLite :memory: suits nodes with no persistent state requirement (accounts, orders, and certificates are lost on restart). Tmpfs WAL (/dev/shm) with the write coalescer outperforms :memory: under concurrency (up to 1,588 iss/s vs ~889 iss/s) and provides crash-recoverable state. For persistent deployments, PostgreSQL is recommended; use a connection pool of 20–25 ([database] pool_connections = 25).


Memory

The benchmark instruments heap allocation using a custom GlobalAlloc wrapper. Per-issuance allocation pressure — bytes requested from the system allocator per certificate, including memory subsequently freed — varies by configuration and mode.

In process mode, allocation reflects the client side only (server runs in a separate process); in inprocess mode it includes both client and server.

Process mode (client-side allocation)

ConfigurationPer-issuance alloc
ec:P-256 CA + ec:P-256 client, c=1134 KB
ec:P-256 CA + ec:P-256 client, c=5134 KB
ec:P-256 CA + ec:P-256 client, c=10137 KB
ec:P-256 CA + ec:P-256 client, c=50190 KB
ec:P-256 CA + rsa:4096 client, c=25223 KB
ML-DSA-44 CA + ML-DSA-44 client, c=25257 KB
ML-DSA-65 CA + ML-DSA-65 client, c=25312 KB
ML-DSA-87 CA + ML-DSA-87 client, c=25385 KB

Inprocess mode (client + server allocation)

ConfigurationPer-issuance alloc
ec:P-256 CA + ec:P-256 client, c=1416 KB
ec:P-256 CA + ec:P-256 client, c=5416 KB
ec:P-256 CA + ec:P-256 client, c=10417 KB
ec:P-256 CA + ec:P-256 client, c=50426 KB
ec:P-256 CA + rsa:4096 client, c=50531 KB
ML-DSA-44 CA + ML-DSA-44 client, c=50671 KB
ML-DSA-65 CA + ML-DSA-65 client, c=50770 KB
ML-DSA-87 CA + ML-DSA-87 client, c=50914 KB

The difference between modes (e.g. 416 KB − 134 KB = 282 KB for ec:P-256) represents the server-side allocation per issuance: certificate construction, DER encoding, audit logging, and database writes.

JSON output

The "memory" key is present when --output json is used:

{
  "memory": {
    "start_live_bytes":           102400,
    "server_ready_live_bytes":    204800,
    "after_bench_live_bytes":     614400,
    "peak_live_bytes":           1572864,
    "server_overhead_bytes":      512000,
    "issuance_growth_bytes":      409600,
    "per_issuance_growth_bytes":    1900,
    "issuance_alloc_bytes":     87523328,
    "per_issuance_alloc_bytes":   150120,
    "total_alloc_count":         319099
  }
}
FieldMeaning
*_live_bytesHeap footprint at each milestone
peak_live_bytesHighest live bytes seen during the issuance window
server_overhead_bytesLive growth from start to server-ready
issuance_growth_bytesLive growth from server-ready to end of bench
per_issuance_growth_bytesPer-issuance share of issuance growth
issuance_alloc_bytesTotal bytes requested during the issuance window
per_issuance_alloc_bytesPer-issuance allocation pressure
total_alloc_countTotal number of alloc calls in the whole process

Running the benchmark

Full suite

The benchmark suite script runs all configurations and writes newline-delimited JSON results:

cargo build --release

# Inprocess mode (default)
contrib/performance/run_benchmarks.sh [OUTPUT_FILE]

# Process mode
SPAWN_MODE="--spawn process" contrib/performance/run_benchmarks.sh [OUTPUT_FILE]

Post-processing examples:

# Print throughput for all runs
jq -r '.label + ": " + (.summary.throughput_per_sec|round|tostring) + " iss/s"' results.ndjson

# Extract concurrency scaling table
jq 'select(.label | startswith("concurrency_"))
    | [.label, .summary.throughput_per_sec,
       .summary.total_latency_ms.mean, .summary.total_latency_ms.p95]' results.ndjson

Individual runs

cargo build --release

# Concurrency sweep (process mode)
for c in 1 5 10 25 50; do
  cargo bench --bench acme_bench -- --spawn process --clients $c --requests 300 --warmup 20
done

# Key type comparison at c=25
for kt in ec:P-256 ec:P-384 ed25519 rsa:2048 ml-dsa-44; do
  cargo bench --bench acme_bench -- --spawn process --clients 25 --key-type $kt --requests 100
done

# CA key type comparison
for cakt in ec:P-256 ec:P-384 rsa:2048 rsa:4096; do
  cargo bench --bench acme_bench -- --spawn process --clients 25 --ca-key-type $cakt --requests 100
done

# Post-quantum full chain with verification
cargo bench --bench acme_bench -- \
  --spawn process --clients 25 --ca-key-type ml-dsa-44 --key-type ml-dsa-44 --verify-cert

# Challenge type comparison
cargo bench --bench acme_bench -- --spawn process --clients 25 --challenge dns-persist-01

# Backend comparison (inprocess mode, tmpfs WAL)
cargo bench --bench acme_bench -- --clients 10 --db "sqlite:///dev/shm/bench.db" --requests 300

# RO pool split (inprocess mode)
cargo bench --bench acme_bench -- \
  --clients 10 --db "sqlite:///dev/shm/bench.db" --ro-connections 4 --requests 300

# JSON output for scripting
cargo bench --bench acme_bench -- --spawn process --clients 25 --requests 100 --output json | jq .summary

Available options

OptionDefaultDescription
--spawn MODEinprocessinprocess or process; process starts separate OS processes
--nodes N1Number of akamu nodes in the cluster
--clients N10Concurrent worker tasks
--requests N100Issuances to measure (warmup not counted)
--warmup N10Warmup issuances discarded before measurement
--challenge TYPEhttp-01http-01 or dns-persist-01
--key-type TYPEec:P-256CSR key type (see table above)
--ca-key-type TYPEec:P-256CA key type (same syntax)
--topology MODEdirectdirect (round-robin) or proxy (single-node proxy)
--no-gossipoffDisable gossip in multi-node runs
--db PATH:memory:SQLite URL or PostgreSQL connection string
--pool-connections N1Write connection pool size
--ro-connections N0Read-only connection pool size (0 = no split)
--wildcardoffIssue *.bench-N.acme-bench.test (dns-persist-01 only)
--output FORMATtexttext or json
--verify-certoffParse and verify the SAN of every issued certificate
--poll-ms N100Challenge poll interval in milliseconds