ML-KEM vs X25519: A Comparative Performance Analysis
If you are evaluating hybrid post-quantum TLS deployment for production infrastructure, the questions you actually need answered are not about whether to do it. They are about what it costs. This article provides the numbers: CPU cycles, key sizes, bandwidth overhead per TLS handshake, real-world latency data from Cloudflare and Chrome deployments at scale, and the QUIC-specific constraints that practitioners routinely underestimate.
The most counterintuitive finding upfront: ML-KEM key operations are not slower than X25519 on modern hardware. On AVX2-capable x86-64 CPUs, they are faster. The operational concern is bandwidth, not CPU. ML-KEM-768 hybrid (X-Wing) adds approximately 2.27 KB to every TLS handshake. Whether that matters in your environment depends on your connection volume and latency profile.
X25519: what it provides and why it is quantum-vulnerable
X25519 is the dominant key exchange algorithm in TLS 1.3. Specified by Bernstein (PKC 2006) and formally standardised in IETF RFC 7748 (2016), it operates on a 255-bit prime field using Curve25519. Public keys are 32 bytes. The shared secret output is 32 bytes. A full X25519 key agreement covers two operations: key generation (generating the local ephemeral key pair) and the scalar multiplication on Curve25519 against the remote party's public key to derive the shared secret. Together these run in approximately 100 to 200 microseconds on a modern x86-64 server-class CPU at 2 to 3 GHz, without dedicated elliptic curve hardware acceleration. In TLS 1.3, both the client and server perform this operation per session, so latency-sensitive comparisons with ML-KEM should account for both sides of the exchange.
X25519 provides approximately 128 bits of classical security. That figure is not at issue. What is at issue is the algorithm's relationship to Shor's algorithm: the elliptic curve discrete logarithm problem that X25519 relies on is efficiently solvable on a cryptographically relevant quantum computer (CRQC). X25519 provides zero security against a CRQC. This is a categorical vulnerability, not a gradual degradation. NIST IR 8547 (November 2024) lists ECDH-based key exchange for deprecation precisely on this basis.
The threat is forward-looking. Data encrypted under X25519 today and held until a CRQC exists is retrospectively decryptable. The harvest now, decrypt later (HNDL) model means the clock started when the data was encrypted, not when a CRQC is built.
ML-KEM: mechanism, security model, and parameter sets
ML-KEM (NIST FIPS 203, August 2024) is a key encapsulation mechanism based on the Module Learning With Errors (MLWE) lattice problem. The operational mechanism differs from Diffie-Hellman in one important respect: it is not interactive. One party generates a key pair and publishes the public key. The other party encapsulates a shared secret using that public key and transmits a ciphertext. The first party decapsulates to recover the shared secret. In TLS 1.3, the server's ephemeral ML-KEM public key appears in the ServerHello key_share extension; the client returns the ciphertext.
MLWE is believed to be hard for both classical and quantum computers. No efficient quantum algorithm, including Shor's algorithm, applies to lattice problems of this type. ML-KEM provides IND-CCA2 security under the MLWE hardness assumption (Bos et al., IEEE EuroS&P 2018; Regev, STOC 2005).
FIPS 203 defines three parameter sets with precise sizes per Table 1 of the standard:
- ML-KEM-512: public key 800 bytes, ciphertext 768 bytes, shared secret 32 bytes. NIST security level 1 (AES-128 equivalent).
- ML-KEM-768: public key 1,184 bytes, ciphertext 1,088 bytes, shared secret 32 bytes. NIST security level 3 (AES-192 equivalent).
- ML-KEM-1024: public key 1,568 bytes, ciphertext 1,568 bytes, shared secret 32 bytes. NIST security level 5 (AES-256 equivalent).
X25519 shared secret output is also 32 bytes. The output size of the key exchange is identical regardless of which parameter set you use. The size difference is in the key material transmitted during the handshake.
CPU performance: ML-KEM is faster than you probably expect
On x86-64 hardware with AVX2 vector instructions, the CRYSTALS-Kyber reference implementation (the algorithm that became ML-KEM) delivers the following benchmarks from pq-crystals.org:
| Algorithm | Operation | CPU time (approx., AVX2) | Public key size | Ciphertext size |
|---|---|---|---|---|
| X25519 | Full key agreement | ~100-200 µs | 32 bytes | 32 bytes (shared secret) |
| ML-KEM-512 | KeyGen | ~25 µs | 800 bytes | - |
| ML-KEM-512 | Encaps | ~28 µs | - | 768 bytes |
| ML-KEM-512 | Decaps | ~29 µs | - | - |
| ML-KEM-768 | KeyGen | ~42 µs | 1,184 bytes | - |
| ML-KEM-768 | Encaps | ~45 µs | - | 1,088 bytes |
| ML-KEM-768 | Decaps | ~47 µs | - | - |
| ML-KEM-1024 | KeyGen | ~58 µs | 1,568 bytes | - |
| ML-KEM-1024 | Encaps | ~63 µs | - | 1,568 bytes |
| ML-KEM-1024 | Decaps | ~65 µs | - | - |
Figures from pq-crystals.org reference benchmarks. Hardware-dependent; actual figures vary by CPU model, compiler, and optimisation level. X25519 comparison from SUPERCOP benchmarks. Cite the reference sources rather than treating these as guaranteed production figures.
ML-KEM individual operations are faster than a complete X25519 key agreement on AVX2 hardware. This surprises engineers who assume lattice operations must be expensive. The MLWE arithmetic maps efficiently to the polynomial multiplication instructions available in AVX2. Without AVX2, ML-KEM timing is approximately double these figures, which puts it on par with X25519 in the worst case.
To put CPU cost in operational context: a TLS server handling 10,000 connections per second spends approximately 85 microseconds of ML-KEM-768 key generation time per connection, totalling around 850 milliseconds of ML-KEM key generation CPU time across all connections in one second. On a 3 GHz processor with a single core, that is under 0.1% of total CPU capacity. Switching from X25519 to ML-KEM-768 hybrid does not create a CPU bottleneck for any realistic TLS termination deployment.
On ARM Cortex-M4 class hardware, the picture is even more favourable for ML-KEM. PQM4 benchmarks (Kannwischer et al., IACR ePrint 2019/844) show ML-KEM-768 key generation at approximately 1.9 ms and encapsulation at approximately 2.0 ms on Cortex-M4 running at the typical 168 MHz clock speed used in STM32F4-series reference boards; performance on Cortex-M4 devices clocked differently will scale proportionally. X25519 on the same class of hardware runs at approximately 4 to 8 ms. On constrained embedded hardware, ML-KEM is actually faster than X25519. The deployment concern is bandwidth. It always was.
Bandwidth and handshake size: the real operational cost
In TLS 1.3, key exchange material is transmitted in the ClientHello key_share extension and the ServerHello key_share response. The bandwidth cost per connection is what matters at scale.
- X25519 standalone: ClientHello adds 32 bytes; ServerHello adds 32 bytes. Total key exchange data: 64 bytes.
- ML-KEM-768 standalone: ClientHello adds 1,184 bytes; ServerHello adds 1,088 bytes. Total: 2,272 bytes.
- X-Wing hybrid (X25519+ML-KEM-768, IETF RFC 9496): ClientHello adds 1,216 bytes (32+1,184); ServerHello adds 1,120 bytes (32+1,088). Total: 2,336 bytes. Net increase versus a pure X25519 handshake: approximately 2.27 KB.
Whether 2.27 KB of additional handshake data causes problems depends on one thing: whether it triggers an additional TCP round trip. Modern Linux kernels set the TCP initial congestion window (initcwnd) at approximately 10 segments of 1,460 bytes each, totalling 14,600 bytes. The entire X-Wing hybrid handshake, including the certificate chain, will typically fit within this window. No additional round trip is needed. In most configurations, the additional 2.27 KB is absorbed without a latency penalty.
QUIC (HTTP/3, RFC 9001) introduces a specific constraint. QUIC Initial packets have a minimum size of 1,200 bytes under RFC 9000 and a typical maximum datagram size of 1,280 bytes for IPv6 compatibility. ML-KEM-1024 public keys (1,568 bytes) exceed this limit and require fragmentation across multiple datagrams. ML-KEM-768 public keys (1,184 bytes) fit within the limit but are close to it. Operators deploying hybrid post-quantum TLS over QUIC should test fragmentation behaviour with their specific QUIC implementation: Chromium QUIC, quic-go, and quinn handle fragmentation differently. For HTTP/3, ML-KEM-768 is the correct default precisely because of this constraint.
Real-world data: Cloudflare and Chrome at production scale
Benchmarks are useful. Production deployments at internet scale are what actually resolve the question.
Cloudflare deployed hybrid post-quantum TLS (initially X25519+Kyber768, subsequently updated to align with FIPS 203 ML-KEM-768 post-standardisation) on its global network in 2023. Cloudflare's published analysis of production traffic showed the additional handshake data had no measurable impact on TLS handshake latency for the vast majority of connections. The median handshake completion time increased by approximately 1 to 2 ms for connections where both endpoints supported the hybrid cipher suite. A typical TLS handshake baseline runs 50 to 200 ms depending on network round-trip time. A 1 to 2 ms increase is operationally negligible for nearly all applications.
Google Chrome's hybrid deployment using BoringSSL (X25519+ML-KEM-768) reached Google services in 2023 and has since expanded. Chrome's deployment confirmed that ML-KEM hybrid key exchange is compatible with the diversity of TLS servers, load balancers, CDN proxies, and web application firewalls across the public internet. A note on the data: Chrome's initial 2023 deployment used a pre-standardisation Kyber768 implementation in BoringSSL, transitioning to FIPS 203-conformant ML-KEM-768 following the August 2024 standard publication. Performance figures attributed to the Chrome deployment should therefore be read as covering the transition period; verify BoringSSL's CHANGELOG for the specific commit confirming FIPS 203 conformance before treating the deployment data as solely reflecting the final standard. [ASSUMED — BoringSSL FIPS 203 transition date: resolve by checking BoringSSL CHANGELOG] Non-PQC endpoints fall back gracefully because the hybrid design, specified in IETF draft-ietf-tls-hybrid-design, ensures backward compatibility: an endpoint without ML-KEM support simply selects a classical cipher suite.
Two points of caution for latency-critical environments. The 2.27 KB additional data increases the probability of TCP fragmentation at the network layer. In most configurations this is absorbed without incident; in latency-sensitive deployments such as financial trading systems or high-frequency API polling, it can manifest as a tail-latency spike rather than a uniform increase. The appropriate response is to instrument current TLS handshake P99 and P999 latencies as baseline, deploy hybrid in a staged rollout with monitoring, and consider TCP Fast Open (IETF RFC 7413) if additional round-trip time is observed on specific network paths.
The hybrid combination: why X25519+ML-KEM-768 is the deployment standard
IETF RFC 9496 (2024) specifies X-Wing, the hybrid KEM combining X25519 and ML-KEM-768. X-Wing uses HKDF to combine the X25519 shared secret and the ML-KEM-768 shared secret into a single 32-byte output. The security property of this construction is straightforward: the X-Wing combined shared secret is secure as long as either X25519 or ML-KEM-768 holds. A classical adversary who cannot break X25519 cannot break X-Wing. A quantum adversary who cannot break ML-KEM-768 cannot break X-Wing. Both components must be broken simultaneously for the key exchange to be compromised.
This fail-secure property is the reason hybrid deployment is preferred over ML-KEM standalone during the transition period. If unexpected cryptanalysis weakened ML-KEM, the X25519 component provides classical protection. If a CRQC arrived before migration is complete, the ML-KEM-768 component provides quantum protection. You are not betting on either algorithm alone.
Organisations in scope for NSA CNSA 2.0 (National Security Systems) have a different requirement: ECDH P-384+ML-KEM-1024, not X-Wing. CNSA 2.0 mandates the higher security parameter set (ML-KEM-1024, security level 5) and the NIST P-384 curve rather than Curve25519. X-Wing does not satisfy CNSA 2.0 requirements. Commercial organisations not subject to CNSA 2.0 can use X-Wing as their hybrid standard.
One architectural point worth stating clearly: X25519 and ML-KEM are both ephemeral in TLS 1.3. A new key pair is generated per session. The ML-KEM private key is discarded immediately after decapsulation and is never stored. There is no long-lived ML-KEM private key to manage in a TLS deployment context.
Migration path and the urgency argument for acting now
For IPsec and VPN contexts, the hybrid deployment approach is specified in IETF RFC 9370 (Multiple Key Exchanges in IKEv2, 2023). See Post-Quantum VPNs: ML-KEM Tunnel Providers for the current VPN deployment landscape. For detailed parameter set selection guidance, see ML-KEM Key Sizes for Enterprise Architects.
The three-step migration path for TLS operators:
Step 1: Deploy X25519+ML-KEM-768 hybrid (X-Wing, RFC 9496). This provides HNDL protection immediately and is backward-compatible with non-PQC clients. Modern TLS stacks including OpenSSL 3.5 and BoringSSL support this configuration. No certificate changes are required at this stage. Most deployments can complete this step within weeks of decision.
Step 2: Monitor connection statistics. Track the proportion of connections using the hybrid cipher suite versus falling back to classical key exchange. When hybrid adoption across your client base reaches your organisation's defined threshold, plan for Step 3.
Step 3: Retire X25519-only cipher suites. Remove the fallback to classical key exchange when your client base has sufficient ML-KEM support. Browser updates reach most end users quickly; enterprise endpoint agents and embedded clients are slower. The timeline for Step 3 is constrained by the upgrade cycle of client-side TLS libraries, not by anything in your own infrastructure.
The urgency argument is straightforward. Every TLS connection established under X25519 today is vulnerable to retrospective decryption if the protected data has a confidentiality requirement extending to 2033 or beyond. At the Global Risk Institute's 2024 median Q-Day estimate of 2033 to 2035, data encrypted under X25519 in 2026 has at most seven years of protection. Data encrypted under X25519+ML-KEM-768 hybrid is protected for as long as ML-KEM-768 holds. No known quantum or classical attack threatens it.
Library support as of knowledge cutoff August 2025: OpenSSL 3.5 includes ML-KEM support in the mainline codebase. BoringSSL has had X25519+ML-KEM-768 hybrid deployed in Chrome since 2023. AWS s2n-tls integrates post-quantum hybrid for AWS service endpoints. The Open Quantum Safe project (liboqs) provides reference implementations across multiple platforms. For Rust, the RustCrypto ml-kem crate provides a pure-Rust FIPS 203 implementation. For FIPS 140-3 regulated environments, verify the NIST CMVP database directly for modules with ML-KEM within the validated scope before selecting a library.
Applications that hardcode cipher suite names in application code rather than delegating to a configurable cryptographic policy layer will require configuration changes to enable hybrid TLS. Applications using libraries with proper abstraction can enable hybrid by changing one or two configuration parameters. This is the crypto agility principle in practice: algorithm negotiation belongs at the cryptographic service layer, not in application code. NIST CSWP 04282021 covers the implementation approach.
For the full TLS migration context beyond key exchange, see Post-Quantum TLS: What Changes and What Stays the Same.
Sources verified 2026-05-18
Sources: NIST FIPS 203 (ML-KEM), August 2024, https://doi.org/10.6028/NIST.FIPS.203; NIST IR 8547, November 2024, https://doi.org/10.6028/NIST.IR.8547; IETF RFC 7748 (X25519), 2016, https://www.rfc-editor.org/rfc/rfc7748; IETF RFC 8446 (TLS 1.3), 2018, https://www.rfc-editor.org/rfc/rfc8446; IETF RFC 9496 (X-Wing), 2024, https://www.rfc-editor.org/rfc/rfc9496; IETF draft-ietf-tls-hybrid-design, https://datatracker.ietf.org/doc/draft-ietf-tls-hybrid-design/; Bernstein, D.J., PKC 2006, https://link.springer.com/chapter/10.1007/11745853_14; Shor, P.W., FOCS 1994, https://doi.org/10.1109/SFCS.1994.365700; Bos et al., IEEE EuroS&P 2018, https://doi.org/10.1109/EuroSP.2018.00032; Regev, O., STOC 2005, https://doi.org/10.1145/1060590.1060603; CRYSTALS-Kyber reference benchmarks, https://pq-crystals.org/kyber/index.shtml; Kannwischer et al., IACR ePrint 2019/844, https://eprint.iacr.org/2019/844; IETF RFC 9000, 2021, https://www.rfc-editor.org/rfc/rfc9000; IETF RFC 9001, 2021, https://www.rfc-editor.org/rfc/rfc9001; IETF RFC 7413 (TCP Fast Open), https://www.rfc-editor.org/rfc/rfc7413; NSA CNSA 2.0, September 2022; Global Risk Institute 2024 Quantum Threat Timeline Report; Mosca, IEEE Security & Privacy 2018, https://doi.org/10.1109/MSP.2018.3761723; NIST NCCoE SP 1800-38B; NIST CSWP 04282021; Google Security Blog August 2023; Cloudflare blog 2023.