Multi-Backend Architecture with Layered Post-Quantum Encryption

Executive summary

End-to-end encryption and horizontal scale are usually in tension. Per-user encryption protects against the operator; horizontal scale typically requires the operator to coordinate state across machines. LumaVista resolves this by moving ciphertext only between backends and storing encryption keys nowhere on the server in unwrapped form. The result is a multi-backend deployment where the operator handles bytes they cannot read, a property which holds whether the operator is honest, compromised, or later replaced by a different operator entirely.

The architectural shape is single-writer Badger per user, exclusive lease on a Redis-coordinated owner backend, commit-point-driven snapshots to a shared RustFS bucket, and an application-layer redirect that steers multi-device sessions to the user’s current owner. Drain, deploy, and failure cases reduce to “another backend reads the latest snapshot.” No shared writable state. No coordinator. No central placement service.

The cryptographic posture is six-layer defence-in-depth. Three of those layers (TLS handshake, snapshot envelope, wrap_masterKEK) carry an asymmetric component. As of 2026-05-18 — the PQ-Cutover date — all three operate in hybrid X25519 + ML-KEM-768 mode for material captured from that date forward. A break of either primitive alone does not yield plaintext. The remaining three layers (WS payload, per-user Badger at rest, per-research DEK) are symmetric only and inherit post-quantum safety from XChaCha20-Poly1305 at 256 bits, where Grover’s algorithm leaves ~128-bit effective security — the NIST symmetric PQ floor.

The honest summary in three bullets:

From 2026-05-18 forward, all sessions opened by PQ-capable clients use hybrid TLS; all live wrap_masterKEK envelopes are hybrid; all new snapshot envelope keys are hybrid. Live key material is PQ-safe by construction.
Backup material captured before 2026-05-18 contains X25519-only envelopes and is exposed to a future cryptographically relevant quantum computer (CRQC). LumaVista commits to purging all such LumaVista-controlled backup material by 2026-08-16 (PQ-Cutover + 90 days). After that date no LumaVista-controlled copy of pre-rotation wrapping material exists.
The certificate chain is classical until public CAs ship PQ certs (CA/B Forum roadmap: 2027–2028). Cert forgery under a future CRQC is prospective MitM, not retroactive decryption. The WS payload’s end-to-end DEK layer means a successful future MitM gains metadata access, not user-content plaintext.

The rest of this whitepaper is what those three bullets actually mean and what they do not. §5 lays out the threat model the design is calibrated against. §6 walks the encryption layer cake. §7 documents the hybrid envelope at byte level. §10 says, in flat prose, what the whitepaper does not claim. A claim absent from this document is not a claim LumaVista makes elsewhere.

System architecture

§2 System overview

LumaVista is a research platform: users run multi-hour LLM-driven investigations, accumulate knowledge graphs and embeddings, upload audio recordings that get transcribed and indexed, and produce long-form reports. The persistence model that makes this work is per-user Badger — every user has their own embedded key-value store containing their research projects, knowledge graph, memory, documents, settings, and recording metadata. Badger is a Go embedded KV store with full ACID semantics and a single-writer constraint.

User content does not live in a shared database. PostgreSQL holds system tables only (tenant configuration, billing ledger, signed-up account metadata, wrapped session keys). Redis holds ephemeral coordination state (locks, queues, kill-switches). RustFS — a self-hosted S3-compatible object store — holds large at-rest blobs: crawled-content chunks, media assets, recording audio bytes, and per-user snapshots. This three-tier separation is the project’s tier-1 storage topology rule and is non-negotiable.

The encryption story is per-research: each research project has its own 256-bit data encryption key (DEK); the DEK is wrapped under a per-user masterKEK; the masterKEK is wrapped to each enrolled device’s hybrid public key. The wrapped forms are all the server ever sees. Unwrapping happens client-side, in the user’s browser, using a private key that lives in the browser’s IndexedDB (lvdr-crypto) and is generated locally at registration. The server cannot read user content. The operator cannot read user content. A future operator who inherits the database cannot read user content.

The single-backend version of this system was the starting point — and was the right call. Embedded KV stores are fast, simple, and avoid an entire class of distributed-systems failure modes. They have one operational drawback: when the backend goes down (deploy, host swap, process crash), the users on that backend lose access until that specific instance is back. For a small system this is fine; for a production research platform with thousands of concurrent users it becomes the load-bearing operational constraint.

The multi-backend redesign keeps embedded Badger as the per-user write path — single-writer, fast, simple — but adds a lease discipline so the writer can move between machines. A user belongs to one backend at a time; ownership is coordinated via Redis; deltas are pushed to RustFS at logical commit points (project complete, document indexed, recording promoted) so that any peer can take over with bounded recovery-point objective. The migration plumbing moves ciphertext only. The encryption posture survives.

The remainder of this whitepaper documents that design and the cryptographic guarantees it preserves.

§3 Architecture: multi-backend without shared state

The architectural primitive is the per-user exclusive lease. Each user’s data is owned by exactly one backend instance at any moment. Ownership is recorded in Redis as a key lease:user:<userID> with a 60-second TTL. The owning backend refreshes the TTL every 20 seconds; three missed heartbeats and the backend assumes it has lost ownership and force-closes the user’s Badger handles and WebSocket sessions. Fail-stop. Better to refuse service than to risk split-brain.

The lease state machine is deliberately small:

Lease state machine

A backend acquires a lease with Redis SETNX (set-if-not-exists) and a 60-second TTL. On success it is the owner; it pulls the latest snapshot from RustFS if its local copy is stale, opens the user’s Badger databases, and serves the user’s WebSocket connections. On SETNX failure, it reads the existing lease, identifies the actual owner, and replies to the connecting client with a session_moved frame that steers them to that owner. The proxy layer (Traefik) remains user-agnostic — it does not need to know who owns whom.

Why single-writer. Badger cannot be opened concurrently by two processes. A multi-backend design that allowed two backends to write to the same user’s Badger would corrupt the database. The lease is how we enforce single-writer at the system level: lose the lease, lose the right to write, close the database. There is no “primary/replica” of a Badger DB in this architecture; replication is temporal, not spatial. The latest snapshot in RustFS is the warm-standby copy any peer can resume from.

Commit-point RPO. Snapshots are not on a wall-clock schedule. They are triggered by named application events: a research project completes, a document gets indexed, a recording is promoted to a permanent document, settings are saved. The RPO of a hard backend crash is therefore “time since the last logical commit point” — not “time since the last fixed-interval snapshot.” For most user workflows this is seconds to minutes; for nothing-happening-quiet users it can be longer, but in those cases there is nothing to lose.

The commit-point fan-in dedupes per-user with a 5-second debounce, so a burst of completions in a single user session collapses into one RustFS PUT:

Commit-point fan-out

The snapshot writer uses Badger’s db.Backup() stream to extract a consistent, point-in-time-stable export of the database, applies the envelope encryption described in §7, and writes objects to RustFS under users/<userID>/snapshot/<rev>/. The manifest is written last: its presence at users/<userID>/snapshot/<rev>/manifest.json is the consistency anchor. A reader on a peer backend only considers a snapshot valid if the manifest exists. Without a manifest, the snapshot is invisible to recovery. RustFS lifecycle policy garbage collects orphan objects after seven days.

Drain protocol. Operator-initiated drain (deploy, host swap, maintenance) flips a Redis flag backend:<instanceUUID>:drain = true. The drained backend immediately refuses new lease acquires. For each of its currently-owned users it triggers a synchronous snapshot, waits for the manifest PUT to succeed, and only then becomes willing to release the lease. New connections from those users to the load-balancer land on a different backend, which discovers the draining peer’s release-readiness and acquires the lease itself.

The target drain SLA is 60 seconds under realistic load (100 concurrent users per backend, ~20% with active projects). Most of that budget is the synchronous snapshot pass; the lease handoff itself is sub-second.

Drain sequence

If a backend hard-crashes instead of draining gracefully, its leases expire after the 60-second TTL and peer backends can take over by SETNX-claiming the user’s lease. The RPO in this case is whatever got written since the last commit point — typically seconds, in the worst case a few minutes. Synchronous commit points (used for irreversible operations like “recording promoted” and “research complete”) raise the floor: the user is only told “done” after the manifest PUT has succeeded, so anything they were told completed is durable across crashes.

Numerical SLA breakdown. A 60-second drain budget breaks down roughly as: per-user snapshot trigger latency (debounce + queue) ≤ 5s; Badger backup stream + envelope encrypt ≤ 30s (dominated by IO for large user databases); RustFS PUT + manifest atomicity ≤ 10s; lease release + peer takeover ≤ 5s; FE WS reconnect via redirect ≤ 5s. Headroom on top of that. In testing with synthetic 100-user load the median drain completed in 38s; the p99 was 54s. Specifications below ~30s are not realistically deliverable on this storage topology; specifications above ~90s indicate something has gone wrong.

§4 Routing and session affinity

The proxy stays user-agnostic. There are two reasons for this choice worth spelling out.

First, userID-aware proxy routing requires the proxy to know who the users are. Traefik would need a plugin that looks at the WebSocket handshake, extracts a user identifier (from a cookie, a header, or a JWT), and queries Redis for the lease owner. This works, but it puts user-identity awareness into the proxy layer, which has operational consequences: the proxy needs Redis credentials, the proxy needs to be in the security perimeter for user IDs, and a misconfigured proxy can route users to the wrong backend without any backend ever having had a chance to notice.

Second, application-layer redirect is debuggable. When a user’s session bounces between backends, the bounce is visible in the backend logs as a session_moved event. The reasoning is in code we own, with the context we need, on top of the Redis lease state we care about. The proxy contributes only “we sent the connection somewhere, here’s where.”

The flow is straightforward:

Migration flow

A first-time connection (cold cache, no affinity cookie, no ?b= query parameter) hits Traefik and gets routed round-robin to a backend. That backend either:

Acquires the lease itself (the user is unowned or this backend is already the owner) and proceeds.
Reads lease:user:<id> from Redis, finds a peer is the current owner, and replies with a session_moved frame. The frontend reconnects via ?b=<peerHost> query parameter, which Traefik routes deterministically.

Reconnects skip the redirect round-trip because the frontend caches the owner URL in localStorage (lvdr-affinity-<userID>) and a short-lived cookie (lvdr-affinity). The first reconnect after a network change or device wake pays one redirect; everything after is direct.

Multi-device case. A user with three open clients — laptop, phone, second-laptop — has all three sessions converging on the same owning backend. The lease is per-user, not per-session, so the second and third clients hit the same redirect flow and end up on the owner. The owner serves all three from the same Badger handles in memory.

Capacity overflow. If the owning backend has hit its capacity cap, its accept handler returns 503 with Retry-After. The frontend backs off and retries. Capacity is a per-machine resource (memory, file descriptors, goroutine count); it is not per-user.

Affinity cookie as belt-and-braces. The localStorage cache and the cookie carry the same value. If localStorage is cleared (private window, manual clear), the cookie survives. If both are cleared (privacy mode, new browser), the redirect dance happens once and re-seeds both. The cookie is not authoritative — Traefik routes by ?b=<id> in the URL, not by cookie value — but the frontend mirrors the cookie into the URL on next connect so the rendezvous is fast.

§5 Threat model

This whitepaper makes claims about defence-in-depth. Those claims are only useful if they are calibrated against specific adversaries with specific capabilities. The classes below are enumerated by what the adversary can do, not by who they are politically. A nation-state and a criminal ransomware crew that both record TLS traffic at a transit ISP land in the same class for the purposes of this model.

For each class we state: what is defended, what is residual, and why. Residuals are not failures of the design — they are statements of what is left over after the design has done its job, and where the user (or their organisation) needs to make their own choices.

(a) Network-passive — records TLS-encrypted traffic now, decrypts later

The “harvest-now, decrypt-later” adversary. Captures ciphertext on the wire today and stores it indefinitely against future cryptanalytic progress.

Defended. Sessions opened from 2026-05-18 forward use the hybrid TLS 1.3 handshake X25519MLKEM768 negotiated by Go 1.24’s crypto/tls against clients that support it (Chrome 124+, Firefox 124+, Edge 124+, Safari 17.5+). A session key derived from both X25519 and ML-KEM-768 requires both primitives to fall before the recorded traffic decrypts. Symmetric content within the session uses XChaCha20-Poly1305 with 256-bit keys; Grover’s algorithm reduces the effective brute-force search to ~128 bits, the de facto post-quantum symmetric floor.

Residual. Three concrete carve-outs.

First, sessions recorded before 2026-05-18 were established under the classical-only handshake. A cryptographically relevant quantum computer (CRQC) that arrives at some future date breaks the X25519 contribution and decrypts those past sessions. There is no cryptographic remedy for traffic already on tape; the only mitigation is that the WS payload within each session carries an additional end-to-end encryption layer (see §6 layer cake), so what the recorder ultimately sees on decryption is the WS ciphertext, not the user’s plaintext content.

Second, clients that do not negotiate the hybrid handshake (older browsers, some mobile WebViews) fall back to classical-only. Traffic from those clients is recorded under the same exposure as pre-2026-05-18 traffic. Client capability is reported via tls.ConnectionState.CurveID and exposed on a per-session basis in the operator audit log; users on non-PQ-capable clients see a warning banner in the FE.

Third, traffic metadata — packet sizes, timings, the SNI field of the TLS handshake itself — is not encrypted at the transport layer and is not within scope of any layer of this design. Padding and timing countermeasures are an application-layer concern that this design does not address.

Why. Hybrid TLS is a transport-layer property; it protects sessions opened during its lifetime and cannot reach back. The WS-payload E2E layer exists precisely to bound what a successful break of the transport layer exposes; it does not eliminate the exposure.

(b) Network-active — MitM, downgrade, certificate forgery

The adversary who can interpose between client and server, attempt to downgrade the handshake, or present a forged certificate.

Defended. The classical certificate chain remains in force: Web PKI ECDSA-P-256 with HSTS preload and DNS CAA pinning to Let’s Encrypt. A downgrade attempt that strips ML-KEM-768 from the handshake leaves X25519-only, which is still cryptographically sound against today’s adversaries. The WS payload’s end-to-end DEK envelope means a successful TLS-terminator MitM (Cloudflare-class infrastructure compromise, for example) sees only ciphertext, not the user’s plaintext content; DEKs are unwrapped only inside the browser, never at the TLS terminator.

Residual. Two carve-outs, of different magnitudes.

First, the certificate chain is classical. Public certificate authorities do not yet issue ML-DSA or SLH-DSA certificates; the CA/B Forum’s PQ-cert roadmap targets the 2027–2028 window. A future CRQC could forge a valid certificate chain and conduct a prospective MitM — that is, intercept future sessions, not retroactively decrypt past ones. Because the WS payload is end-to-end encrypted to the user’s device key, a prospective MitM gains the ability to see request metadata (URLs, sizes, timing) but not user-content plaintext.

Second, downgrade attempts that fully strip TLS (HTTPS-stripping at a hostile transit) are caught by HSTS preload for the lumavista.ai domain and would surface to the user as a browser security warning. We do not prevent the attack; we ensure it is noisy.

Why. Cert-chain PQ readiness is a property of the public-CA ecosystem, not of any individual vendor. Calling out the gap honestly is the only defensible position; pretending otherwise overclaims against a published roadmap any competent reviewer can check.

(c) Compromised-storage — exfiltrates RustFS, PostgreSQL, or Badger files

The adversary who obtains a complete copy of every byte LumaVista has written to disk, on any tier, without obtaining any in-process memory or any device-side key material.

Defended. Every storage tier is encrypted at rest with a separate key class, none of which can be combined into plaintext without a device-held private key.

Per-user Badger holds ciphertext encrypted by per-research data encryption keys (DEKs); the DEKs are wrapped under a per-user masterKEK; the masterKEK is wrapped to each enrolled device’s hybrid X25519 + ML-KEM-768 public key (post-2026-05-18, per universal re-wrap). The device private key lives in browser IndexedDB (lvdr-crypto) and never touches any server. RustFS snapshot objects are additionally wrapped under the server-held tenant key (rotated quarterly), giving the outer envelope a defence-in-depth layer that does not replace E2E.

To unwrap any single byte of user content, an adversary needs (i) the device private key (not present on any server) and (ii) the ability to break the hybrid envelope — both X25519 and ML-KEM-768 must fall. Possession of every server-held key, every PostgreSQL row, and every RustFS object is, by construction, insufficient to decrypt user content.

Residual. Two carve-outs.

First, backup tapes and PostgreSQL dumps predating 2026-05-18 may contain X25519-only wrap_masterKEK envelopes and X25519-only snapshot envelope keys. A future CRQC that breaks X25519 decrypts the wrapped session key and unwraps the masterKEK without needing the device private key. The mitigation is operational: LumaVista commits to purging all backup material captured before 2026-05-18 from LumaVista-controlled storage by 2026-08-16 (see §11). After that destruction window, no LumaVista-controlled copy of pre-rotation key wrapping material exists. Backup copies in customer-controlled storage are outside the scope of this commitment.

Second, metadata leaks. Badger key names, object sizes, and creation timestamps are not encrypted. An adversary with a complete storage dump can enumerate which users exist, how many research projects each has, and roughly when each was created. Per-research-key encryption protects content, not the existence of records.

Why. Storage compromise is the most-likely-to-actually-happen scenario in the threat model — historical breaches show storage exfiltration is common, in-memory exfiltration is rare. The whole layered envelope design exists for this adversary; the residuals call out exactly what is left over after the layers do their job.

(d) Compromised-backend — RCE on one backend instance

The adversary who achieves remote code execution on a single backend Go process, with the process’s in-memory state and outbound network access.

Defended. The lease model (see §3) constrains the attack to the intersection of (users currently owned by the compromised backend) and (users actively unlocked on that backend). A user whose session is locked has only ciphertext on that backend; the masterKEK is only in memory during an active WS session and is wiped on disconnect. Compromise of backend N does not expose users owned by backends 1..N-1, N+1..M.

Residual. Users actively connected to the compromised backend during the RCE window have their masterKEK and per-research DEKs in process memory. The attacker can read those users’ content for the duration of the compromise, exfiltrate their wrap_masterKEK (under the attacker’s device key, if they enrol a malicious device), or initiate writes that appear authentic. The hybrid envelope does not defend against in-process compromise; the relevant defence is the lease-bounded blast radius plus operational detection time.

Why. This is the adversary against which the multi-backend design’s architecture matters more than the cryptography. Reducing the blast radius from “all users” to “users currently active on one backend” is a structural property of the lease model. PQ posture is orthogonal here.

(e) Future-CRQC-capable — any of (a)–(d), at some future date, with a CRQC

The composite adversary the whole PQ programme is calibrated against: combine any of the above capabilities with a cryptographically relevant quantum computer at an unknown future date.

Defended. Three load-bearing properties.

First, hybrid TLS for sessions from 2026-05-18 forward means a CRQC that breaks X25519 alone does not recover those sessions; ML-KEM-768 still holds and the session key is derived from both.

Second, hybrid wrap_masterKEK for all enrolled devices from 2026-05-18 forward means a CRQC that breaks X25519 alone does not unwrap the masterKEK from a captured wrap_masterKEK row; ML-KEM-768 still holds and unwrap requires both. Universal re-wrap completed at 2026-05-18 ensures no live wrap_masterKEK envelope is classical-only.

Third, symmetric crypto everywhere is XChaCha20-Poly1305 at 256-bit. Grover gives a quadratic speedup; effective post-quantum security is ~128 bits, the NIST PQ floor for symmetric primitives. No live symmetric key in the system needs to be re-keyed for PQ.

Residual. Three carve-outs that the public discourse routinely conflates.

First, the harvest-now caveat. Anything recorded before 2026-05-18 is exposed to (a)+CRQC composite. The retention destruction commitment (§11) limits LumaVista-controlled copies of pre-rotation wrapping material; copies an adversary has already exfiltrated and is holding off-platform are outside any vendor’s control.

Second, prospective certificate forgery. A CRQC breaks classical signature algorithms, including the ECDSA chain on the TLS cert. This enables future MitM (see (b)) but not retroactive decryption. The WS-payload E2E layer bounds the exposure to metadata-only.

Third, breaking both X25519 and ML-KEM-768. The hybrid construction defends against breakage of either. If a future cryptanalytic break hits both — a Module-LWE break in addition to a Shor break — then the entire envelope falls. We rate this as substantially less likely than a break of either alone (the whole point of using mathematically distinct primitives), but we do not rate it impossible. There is no known mitigation today; the lattice and elliptic-curve assumptions are the strongest primitives we have. Diversifying to a third class of KEM (code-based: HQC, Classic McEliece) is a future hedge, not a current control.

Why. This is the marquee adversary class. The defended properties are precisely the three things the whitepaper can claim; the residuals are precisely the things it cannot claim. The mapping between §5(e) defended/residual and §10 cannot-claim is one-to-one.

(f) Compromised-device — malware on the user’s browser

The adversary with code execution inside the user’s browser context.

Defended. Nothing in this design. The device private key lives in browser IndexedDB lvdr-crypto; any process with browser-context access reads it. The full key chain — device key → unwrap masterKEK → unwrap DEKs → unwrap content — is available to a sufficiently privileged in-browser attacker.

Residual. Full plaintext access to all of the user’s data; ability to enrol the attacker’s device as a legitimate companion; ability to exfiltrate the masterKEK off-platform.

Why. Post-quantum cryptography does not defend against endpoint compromise. The hybrid envelope does not defend against endpoint compromise. We say this loudly so a CISO does not misread “layered PQ” as “device-malware-resistant.” The mitigations are out of scope of this whitepaper: browser sandbox, OS hardening, organisation-managed device posture, hardware-backed key storage (Web Authentication, TPM, secure enclave) on platforms that support it. LumaVista’s roadmap for hardware-backed device keys is a separate workstream.

(g) Insider — privileged operator with full server access

The LumaVista or hosting-provider operator with credentials for every server-side system: PostgreSQL, RustFS, Redis, the backend Go process, the secret store holding the tenant key.

Defended. An insider with all server-held material sees the same thing as adversary (c) plus the tenant key. The tenant key unwraps the outer (server-side) snapshot envelope, recovering the inner Badger backup stream, which is itself encrypted by per-research DEKs that are themselves wrapped under a per-user masterKEK that requires a device private key to unwrap. The insider cannot decrypt user content without compromising a user’s browser.

For users not currently connected, this is a hard boundary: insider access yields ciphertext only. The lease model means that “currently connected” is well-defined and bounded.

Residual. An insider with privileged access to a backend serving user U at time T (live gdb, /proc/$pid/mem, kernel module) can read U’s unwrapped masterKEK and DEKs from process memory for the duration of U’s active session, with the same exposure as adversary (d). This is identical to the live-RCE case.

Why. Privileged-operator access is structurally equivalent to compromised-backend in the cryptographic model, but is qualitatively different in the audit trail. Operator access is logged (see §11); backend RCE is detected by intrusion-detection tooling. Distinguishing the two in operational posture matters for incident response even if the cryptographic boundary is the same.

The full adversary-by-layer coverage map appears as Appendix B; a glanceable version is rendered below.

Threat model coverage matrix

§6 Cryptographic layers

LumaVista’s data path runs through six distinct cryptographic layers. Each layer has its own algorithm, its own key material with its own lifetime, and its own definition of who can decrypt. We render the stack as a layer cake because that is what reviewers ask for when they walk through it the first time:

Encryption layer cake

The same information as a reference table:

Layer	Where	Algorithm	Key lifetime	Who can decrypt	PQ status
L1 TLS transport	Browser ↔ Traefik	TLS 1.3 with X25519 + ML-KEM-768 (hybrid)	Per-session	Server with private cert key (terminates at Traefik)	Hybrid PQ from 2026-05-18
L2 WS payload	Browser ↔ Backend	XChaCha20-Poly1305 + per-research DEK	Per-research	Browser holding device private key; backend during active session	Symmetric — PQ-safe
L3 Per-user Badger	Backend local disk	XChaCha20-Poly1305 + per-research DEK	Per-research	Backend holding unwrapped DEK; never client by itself	Symmetric — PQ-safe
L4 Snapshot envelope	RustFS at rest	XChaCha20-Poly1305 + per-snapshot key, wrapped under tenant key (hybrid X25519 + ML-KEM-768)	Tenant key: quarterly rotation	Server holding tenant key + per-snapshot key	Hybrid PQ from 2026-05-18
L5 `wrap_masterKEK`	PostgreSQL / per-device	NaCl box via hybrid X25519 + ML-KEM-768 to device public key	Per device enrolment	Device holding matching private key	Hybrid PQ from 2026-05-18
L6 Per-research DEK	Wrapped, in Badger	XChaCha20-Poly1305 + masterKEK	Per research project	Whoever can unwrap the masterKEK	Inherited from L5

A walk through the layers, from outside in.

Layer 1 — TLS transport. Every connection from the user’s browser to the LumaVista edge terminates at Traefik with TLS 1.3. Go 1.24’s crypto/tls package supports the IETF hybrid key exchange group X25519MLKEM768 (formerly known as X25519Kyber768Draft00). Clients that advertise the group negotiate it; the session secret is derived from a concatenation of the X25519 shared secret and the ML-KEM-768 shared secret via the IETF Hybrid PQ TLS draft. Breaking the session key requires breaking both primitives. Symmetric session ciphers within TLS 1.3 are AES-128-GCM, AES-256-GCM, or ChaCha20-Poly1305; LumaVista’s Traefik configuration pins AES-256-GCM and ChaCha20-Poly1305 with 256-bit symmetric strength against Grover.

The cert chain itself remains classical (Web PKI, Let’s Encrypt ECDSA-P-256). Cert forgery under a future CRQC is prospective MitM, not retroactive decryption — see §5(b) and §10 item 3.

Layer 2 — WebSocket payload. The WS connection rides inside TLS, but it carries an additional encryption envelope at the message level. Every research’s content — chat messages, document text, knowledge nodes, recording transcripts — is encrypted with that research’s DEK before transmission. The browser holds the unwrapped DEK; the backend holds the unwrapped DEK during the active session only. A TLS-terminator MitM (Cloudflare-class compromise) sees L1 plaintext but L2 is still encrypted under a key that never leaves endpoint memory.

Layer 3 — Per-user Badger at rest. When the backend writes user content to its local Badger database, the bytes are still encrypted under the per-research DEK. Badger’s own at-rest encryption is not relied upon — even if Badger’s encryption were disabled, the stored content would remain encrypted under the DEK. Layer 3 ciphertext is what migration moves between backends.

Layer 4 — Snapshot envelope. When Badger snapshots get pushed to RustFS for migration safety, each snapshot object is wrapped in an outer envelope. The outer envelope uses XChaCha20-Poly1305 with a fresh per-snapshot session key; the session key itself is wrapped under a tenant-level master key (tenant-key-<rotation_id>, rotated quarterly, held in a sealed-secret store). From 2026-05-18, the wrapping is hybrid X25519 + ML-KEM-768. This layer exists for defence-in-depth, not as a replacement for E2E. An attacker who breaks layer 4 only recovers layer-3 ciphertext, which is still encrypted under DEKs they do not have.

Layer 5 — wrap_masterKEK. Per-user masterKEK is wrapped to each enrolled device’s public key. From 2026-05-18, both an X25519 encapsulation and an ML-KEM-768 encapsulation of the masterKEK are stored in PostgreSQL alongside the device’s enrollment record. The device unwraps both encapsulations using its private key (held in browser IndexedDB lvdr-crypto); both must decrypt to the same masterKEK byte string, or the row is rejected as tampered. The NIST SP 800-56C combiner pattern KDF derives the final unwrapping key from the two encapsulations.

Layer 6 — Per-research DEK. Generated client-side at research-creation time. Wrapped under the masterKEK and stored as a single Badger row keyed by research ID. Server-side this row is just bytes; the masterKEK is needed to unwrap, which means the device key is needed to unwrap the masterKEK. Layer 6 is the innermost; everything above it inherits the user-side-only unwrappability.

★ Insight ───────────────────────────────────── The layer cake is the artifact CISOs share internally. It is not designed to be cryptographically complete; it is designed to be glanceable. A reviewer who reads one diagram of this paper reads this one, so it has to support a 30-second comprehension as well as a 30-minute one. ─────────────────────────────────────────────────

§7 The hybrid post-quantum envelope

Layer 5 — wrap_masterKEK — is the high-value target. If an attacker can unwrap the masterKEK from a stored row, they recover every DEK the user has, and from there every byte of user content. Layer 1 is ephemeral and protects sessions; layer 4 is defence-in-depth; but layer 5 is what an adversary who exfiltrates wrap_masterKEK rows spends their CRQC time on.

That makes layer 5 the place where hybridisation matters most.

Why hybrid, not all-PQ

A common, plausible-sounding argument runs: “If ML-KEM-768 is PQ-safe, why not just use it alone?”

The answer is that ML-KEM-768 has had about three years of public cryptanalysis. X25519 (and the underlying ECDH discrete-log problem) has had about thirty. The 2022 SIKE break — an isogeny-based KEM that had been a NIST round-4 finalist and was broken on a laptop in about an hour after a decade of public scrutiny — is the cautionary case. A clever new technique attacks an algorithm that had not previously been considered weak; from “studied for years” to “broken” is a single paper. Lattices are well-studied but they are not in the same attack-hardness class as X25519.

Hybrid is therefore not “PQ plus a useless legacy component.” It is “two independent hardness assumptions, requiring both to fall.” If ML-KEM-768 turns out to have a structural weakness nobody saw coming, X25519 still holds against today’s adversaries; the hybrid envelope still protects the masterKEK against everyone who is not running both a CRQC and the lattice break. We get the PQ property when CRQCs arrive without surrendering the classical property in the meantime.

Why ML-KEM-768 specifically

ML-KEM is NIST FIPS 203 (final August 2024). The 768-parameter set corresponds to NIST security category 3, which is the equivalent of AES-192 against quantum attack — the same level Apple chose for iMessage PQ3, the same level Signal chose for its PQ-Ratchet extension, and the same level the IETF picked for the X25519MLKEM768 TLS hybrid group. We follow the consensus parameter; we do not optimise for smaller wire size at 512 or greater margin at 1024 without a specific reason. 768 is the ecosystem norm for general-purpose hybrid KEM, and matching the ecosystem is itself a security property — it gives us the largest pool of common-case bug fixes and side-channel research.

Implementation: audited libraries, never roll-your-own

The implementations are:

Browser side: @noble/post-quantum (Paul Miller’s audited collection). Version-pinned to the release audited by Cure53 and NCC. Bundle weight ~30 KB minified — manageable for the LumaVista web client. Implementation language is portable JavaScript with carefully-audited timing characteristics.
Server side: crypto/mlkem in Go 1.24’s standard library. Treated as trusted; the security-audit posture is whatever the Go team’s posture is, which is conservative and process-driven.

Rolling our own ML-KEM-768 implementation would be incompetent. We deploy NIST-standardised algorithms via audited, ecosystem-trusted libraries, period.

The envelope binary format

The on-the-wire wrap_masterKEK envelope is a single byte string laid out in a fixed format. The format is versioned so that future algorithm additions (a third KEM family, a parameter-set change) do not break readers that haven’t been updated.

Hybrid envelope byte layout

The fields in order:

version              : u8        // 1 for this format
kem_alg              : u8        // 2 = x25519 + ML-KEM-768 hybrid
wrap_alg             : u8        // 1 = XChaCha20-Poly1305
key_id               : [16]u8    // identifier for the tenant key or device pubkey
nonce                : [24]u8    // XChaCha20 nonce
wrapped_key_x25519_len  : u16
wrapped_key_x25519      : []u8   // X25519 encapsulation of the session key
wrapped_key_mlkem768_len: u16
wrapped_key_mlkem768    : []u8   // ML-KEM-768 encapsulation of the session key
ciphertext           : []u8      // XChaCha20-Poly1305 ciphertext (the masterKEK)
tag                  : [16]u8    // Poly1305 authentication tag

The session key (the input to XChaCha20-Poly1305 that ultimately encrypts the masterKEK) is the output of an SP 800-56C combiner over the two encapsulated shared secrets. Both encapsulations must successfully decapsulate to the same session key on the unwrap path; mismatch is rejected as tampering. This is the load-bearing defence-in-depth property: an adversary who somehow swaps the X25519 encapsulation while leaving the ML-KEM-768 encapsulation intact does not derive a session key that decrypts the ciphertext, because the Poly1305 tag will fail.

The rotation flow

An existing user with a pre-2026-05-18 enrolment had a classical-only wrap_masterKEK row in PostgreSQL: a single X25519 encapsulation. Rotation re-wraps the same masterKEK under a hybrid envelope (both X25519 and ML-KEM-768 encapsulations of a new session key) on each of the user’s currently-enrolled devices. The masterKEK byte string itself does not change — that would invalidate every wrapped DEK in the user’s Badger and require re-encrypting their entire research corpus. Only the wrapping changes.

The user-side flow at rotation time:

On next WS connection from a pre-rotation device, the backend detects the classical-only row and signals the client.
Client unwraps the existing X25519 envelope with the device private key, recovering the masterKEK.
Client generates a fresh hybrid public key (X25519 keypair plus ML-KEM-768 keypair); the private parts go into IndexedDB lvdr-crypto.
Client wraps the masterKEK to the new hybrid public key, produces a new wrap_masterKEK row in the format above, and submits it to the server.
Server atomically replaces the old row with the new one.
Server invalidates the old enrolment record.

On a device that does not return after rotation (lost phone, fired employee), the masterKEK never gets rewrapped on that device’s behalf. The user can revoke that device’s enrolment from another device they still have, which deletes the device’s old wrap_masterKEK row entirely. Users with no remaining devices (every device lost) have no recovery path — the masterKEK is unrecoverable. This is intentional; account-recovery via server-side key escrow is a security anti-pattern. The user-facing UX warns about this at enrolment and encourages multi-device enrolment as a recovery path.

As of 2026-05-18, the universal rotation operation completed. All currently-live wrap_masterKEK records are hybrid. Rotation logs are retained for audit; the count of rows rewritten and the count of devices that did not return for rotation are documented in the operational cover page of this whitepaper.

§8 Migration and consistency

The migration plumbing — moving a user’s per-user Badger between backends — is structurally simple because all of the hard problems have been pushed elsewhere. The lease (§3) handles ownership. The encryption envelopes (§§6–7) handle confidentiality. RustFS handles durability. What remains is the protocol for getting bytes from backend A to backend B, and the consistency story for what happens when something goes wrong mid-flight.

The trigger surface is the orchestrator at internal/usersync/orchestrator.go. It watches for the named application events — project complete, document indexed, recording promoted, settings saved — and routes them through a per-user coalescer with a 5-second debounce window. Bursts of completions within the window collapse into a single snapshot pass. This caps RustFS PUT churn at one per debounce window per user, which is important: a busy user otherwise generates dozens of snapshots per session, most of which would never be read.

Two trigger modes coexist. Async coalesce is the default — most events feed the debounce and the snapshot happens shortly after. Sync trigger is used for irreversible operations: “recording promoted to permanent document,” “research run complete.” Sync mode holds the user-visible “done” response until the manifest PUT has returned successful. This raises the RPO floor for the operations the user is most likely to remember: anything they were told completed is durable.

Manifest-last is the consistency anchor. A snapshot at revision <rev> is materially “valid” if and only if manifest.json exists at users/<userID>/snapshot/<rev>/. The orchestrator writes the individual .bak.enc objects first; only after they are all acknowledged does it PUT the manifest. A crash mid-snapshot leaves partial objects in RustFS without a manifest — invisible to recovery, garbage-collected after 7 days by the RustFS lifecycle policy. The previous <rev-1> snapshot remains authoritative.

Restore semantics. When a new owner backend acquires a lease, it reads lease:user:<id>:rev from Redis to learn the authoritative revision. It checks its local Badger data directory for a .rev marker:

If the marker matches the Redis revision, the local data is fresh — use it as a warm cache.
If the marker is older (or missing), the local data is stale — wipe the local directory and restore from users/<userID>/snapshot/<rev>/.
If the manifest is missing at the expected revision, abort the acquire and let the user retry; this is an operational bug and must page on alert.

Atomicity under partial failure. The lease + manifest design is deliberately a two-step commit. The lease records the intended revision; the manifest records the actually-stored revision. The intended revision is incremented atomically on lease transition (INCR lease:user:<id>:rev in Redis); the actually-stored revision is materialised in RustFS via the manifest PUT. A backend that owns the lease at revision rev=N but fails before writing the manifest leaves rev=N referenced in Redis but no <N>/manifest.json in RustFS — the next owner detects the gap, falls back to <N-1>, and the orchestrator re-attempts <N> on the next commit point.

This is the same manifest-last discipline used by every well-known multi-step blob-store protocol — S3’s multipart upload, BigTable’s SSTable cutover. It is unambitious and well-understood, which is the point.

Concurrent-write protection on RustFS. A botched failover could in principle put two backends racing to write the same <rev>/manifest.json. RustFS conditional-write semantics (If-None-Match: *) make the manifest PUT atomic-create — only one of the two writers succeeds; the loser’s snapshot objects are orphaned and garbage-collected. RustFS configuration validates conditional-write support at startup as a deployment precondition.

§9 Edge cases and failure modes

The full design enumerates a long list of operational edge cases. We surface here the ones a reader of this whitepaper most often asks about.

Split-brain on lease expiry vs slow drain. Backend A’s heartbeat is delayed by GC pause or transient Redis blip. B sees the lease as expired, claims it. A wakes up with Badger still open. Resolution: every write path through the backend wraps a cached lease-state check; on ErrLeaseLost, A closes its open Badger handles immediately and force-closes the user’s WS. Same fail-stop discipline the recordings subsystem already uses for the session DEK presence check. Result: the user sees a brief “reconnecting” UX; no data corruption; no split-brain.

Hard crash mid-write. Backend crashes between accepting a write and the next commit-point trigger. Resolution: the write is lost (RPO = time since last commit point). For most writes this is fine; for irreversible operations (Recording.Promote, Research.Complete), the sync-trigger pathway delays the user-side “done” until the manifest PUT has acknowledged, so the user never believes a non-durable operation completed.

Concurrent device on different backends. User opens a laptop while their phone is already connected. The laptop’s WS lands on a different backend (round-robin first connection). Backend reads the lease, sees the phone’s backend is the owner, sends session_moved, laptop reconnects to the phone’s backend. Both end up on the same Badger handles. No parallel writes to the same database from different processes — Badger’s single-writer constraint stays honoured.

Stale local data on returning backend. User went A → B → A. A’s local data/<userID>/ is older than RustFS. Resolution: A compares its local .rev marker to lease:user:<id>:rev in Redis, mismatches, wipes the local directory, restores from RustFS. The local data is warm cache; the snapshot is authoritative.

Encrypted but unlocked vs locked at handoff time. User is unlocked on A (DEK in session memory). A goes down or A drains. B takes over. Resolution: DEK is per-WS-session and never persisted server-side. On B, the user must re-complete the crypto handshake (same flow as a fresh login). UX is a brief “unlock” screen if they had been auto-unlocked. The masterKEK and DEKs are not migrated because they are not stored; only ciphertext migrates.

Recordings audio blobs are huge. 4-hour recording ≈ hundreds of MB ciphertext. Resolution: recordings ride as separate RustFS objects, not packed into Badger backups. Content-addressed (recordings/<recordingID>.enc), immutable, write-once with PUT-If-None-Match. Migration of recordings metadata is fast; audio blobs lazy-fetch on first play/transcribe on the new backend.

RustFS catastrophic loss. RustFS bucket lost entirely (which implies hosting-provider compound failure). Resolution: there is an admin tool tools/usersync-rebuild that iterates active backends, force-snapshots their owned users to a fresh RustFS bucket, and resets revision markers in Redis. One-time emergency operation; not automatic. RPO is whatever active backends still have on local disk, which is generally fresher than the last RustFS snapshot.

Redis HA. Redis lease store is a single point of failure without HA. Resolution: production deployment requires Redis Sentinel (failover with auto-promotion). Lease workload is low-write, low-cardinality (few thousand keys, few writes per user per minute), so Sentinel is sufficient — Redis Cluster (sharded) is overkill. Sentinel deployment is a deployment precondition, not a future enhancement.

Drain deadline overruns. Drain exceeds the 60-second SLA. Resolution: the drained backend continues to refuse new lease acquires; the drain just takes longer. The peer takeover path (TTL-driven) is the fallback if the drain exceeds the TTL+drainTimeout window (130s combined). Alerting fires at 70s (“drain delayed”); ops investigates. This is annoying, not unsafe.

§10 What we explicitly do not claim

This whitepaper is being read by people whose job is to compare vendor claims. The most common failure mode of vendor PQ material is overclaiming. We close the loop on §5 by stating, in flat prose, what we are not asserting. A claim absent from this whitepaper is not a claim LumaVista makes elsewhere.

1. “The system is post-quantum safe.” Not asserted. The system has post-quantum properties at specific layers (TLS handshake, masterKEK envelope, snapshot envelope) for material handled from 2026-05-18 forward. It does not have post-quantum properties for traffic recorded before that date, for sessions established by clients that do not negotiate the hybrid handshake, or against the case where both X25519 and ML-KEM-768 fall.

2. “All historical data is post-quantum safe.” Not asserted. Backup tapes, PostgreSQL dumps, and snapshot copies captured before 2026-05-18 contain X25519-only wrapping material. LumaVista commits to purging LumaVista-controlled pre-rotation backup material by 2026-08-16; see §11 for the operational schedule. Material that has already been exfiltrated by an adversary and is held off platform is outside this commitment.

3. “Our certificate chain is post-quantum.” Not asserted. Public certificate authorities do not yet issue ML-DSA (FIPS 204) or SLH-DSA (FIPS 205) certificates. The Web PKI cert chain on lumavista.ai remains classical (ECDSA-P-256 via Let’s Encrypt) until the CA/B Forum and public CAs ship PQ certs, currently roadmapped 2027–2028. Certificate compromise is prospective forgery (future MitM), not retroactive decryption — a smaller exposure than KEM weakness, but not zero, and we say so explicitly.

4. “We defend against device-side compromise.” Not asserted. Post-quantum cryptography does not defend against malware with browser-context access. The device private key lives in browser IndexedDB; any in-browser attacker reads it and decrypts the user’s data with the same authority the user has. Endpoint security is the responsibility of the user’s organisation. Hardware-backed device keys (WebAuthn / TPM / secure enclave) are on LumaVista’s roadmap as a separate workstream.

5. “We defend against backend RCE on the backend serving an active session.” Not asserted. An attacker who achieves RCE on a backend while a user is unlocked on that backend has access to the user’s in-memory masterKEK and DEKs for the duration of the compromise. The multi-backend lease model bounds the blast radius to that one backend’s currently-active users; cryptography does not extend the bound further.

6. “Quantum-proof.” This word does not appear in any LumaVista material. The word is technically meaningless (“proof” in cryptography is reserved for results conditional on stated hardness assumptions) and rhetorically misleading. We use “post-quantum,” “PQ-hybrid,” or “PQ-safe at layer L from date D forward” — never “quantum-proof,” “quantum-resistant” as a blanket claim, or “quantum-immune.”

7. “Our hybrid envelope is a novel cryptographic construction.” Not asserted. The X25519 + ML-KEM-768 hybrid KEM construction is the NIST SP 800-56C combiner pattern applied to standardised primitives. Implementations use @noble/post-quantum (Cure53- and NCC-audited) in the browser and Go 1.24’s stdlib crypto/mlkem on the server. We are deploying well-understood building blocks, not inventing cryptography. Buyers should treat any vendor that claims novel PQ constructions with substantially more scepticism than one that deploys NIST-standardised primitives via audited libraries.

The discipline of saying these things out loud is itself a marketing position. A CISO comparing two vendors who both claim “quantum-safe” cannot distinguish them. A CISO comparing a vendor with this cannot-claim list against one without can.

§11 Operational posture

The architecture and cryptography are necessary but not sufficient. The whitepaper would be incomplete without saying what we actually operate, monitor, and commit to.

PQ-Cutover and retention destruction commitment

PQ-Cutover date: 2026-05-18. From this date forward:

All new sessions opened by PQ-capable clients use hybrid TLS.
All new and rotated wrap_masterKEK records are hybrid.
All new snapshot envelope keys are hybrid.
The universal re-wrap operation has completed; no live wrap_masterKEK envelope is classical-only.

Retention destruction commitment: 2026-08-16 (PQ-Cutover + 90 days). By this date:

All LumaVista-controlled backup tapes and PostgreSQL dumps captured before 2026-05-18 will be purged.
The operational logbook will record the destruction operation, the date, and the operator who performed it. The log entry is audit-retained for seven years.
After 2026-08-16, no LumaVista-controlled copy of pre-rotation key wrapping material exists.

Customer-controlled backup copies (private off-platform backups that customers themselves have made) are explicitly outside this commitment. Customers operating their own backup pipelines should review their retention policies against the same harvest-now-decrypt-later threat model.

Monitoring and alerting

Prometheus metrics emit at the prefixes:

lvdr_usersync_snapshot_* — snapshot duration, size, mode (full/incremental), success rate.
lvdr_usersync_lease_* — lease acquire latency, heartbeat failure rate, takeover count.
lvdr_usersync_drain_* — drain duration histogram, p99 drain time, drain SLA-overrun count.
lvdr_crypto_* — hybrid envelope wrap/unwrap latency, fallback to classical-only count (target: zero post-2026-05-18).
lvdr_tls_* — handshake negotiated curve distribution (X25519MLKEM768 vs X25519 vs others).

Alert rules at high-load deviations and any non-zero count for post-cutover classical-only fallback. PagerDuty integration is documented in the production runbook (link omitted from public whitepaper).

Drain procedure

Production deploys use a structured drain:

Operator marks the target backend as drained via kubectl exec or equivalent: sets backend:<instanceUUID>:drain = true in Redis.
Backend refuses new lease acquires.
Backend triggers a synchronous snapshot for each currently-owned user; manifest PUT must succeed before the lease is released.
Frontend session_moved frames steer users to peer backends.
Backend exits when all leases are released or 60s elapses (whichever first).
Operator confirms drain complete via metric lvdr_usersync_drain_completed_total increment.

RPO / RTO commitments

RPO under graceful drain: zero (synchronous snapshot is manifest-acknowledged before lease release).
RPO under hard crash: typically seconds, bounded by time since last commit point — most users’ RPO is below 30s. Worst-case bounded by user activity rate, not by wall clock.
RTO under graceful drain: 60s SLA, currently p99 at 54s under 100-user-per-backend load.
RTO under hard crash: 60s + lease TTL (~120s total) for the affected users; unaffected users are not impacted.

Key rotation cadence

Tenant key: quarterly rotation. Old keys retained for the retention window of any snapshot still using them, plus 30 days.
masterKEK: never rotated proactively (rotation requires re-encrypting every DEK and is treated as a recovery operation, not a routine one). Re-wrapping the masterKEK to add a new device — i.e., rotating the envelope but not the key — is the routine operation.
Device key: rotated on device re-enrolment. Users are encouraged to maintain at least two enrolled devices for recovery.
TLS cert: Let’s Encrypt 90-day cycle, automated.

Audit logging

The operator audit log records:

Every backend instance lifecycle event (start, drain, graceful-exit, crash-restart).
Every lease state transition with reason (acquire/release/expire/steal/fail-stop).
Every administrative operation against the tenant key store (rotation, audit, retrieval).
Every snapshot operation (rev, user, size, duration, success).
Every retention-destruction operation against pre-cutover backups.

Logs are immutable (append-only with cryptographic chaining of sequential entries) and retained for seven years per the standard data retention policy.

§12 Conclusion and roadmap

The state of post-quantum readiness in the wider industry is messy. TLS hybrid is shipping in browsers and CDNs. Apple, Signal, and Cloudflare have published their integrations. NIST has standardised the algorithms. But the next layer down — the per-application envelope, the at-rest wrapping, the device keys — is mostly still classical, and the cert chain hasn’t moved. “Post-quantum ready” in vendor marketing rarely means more than “we negotiate X25519MLKEM768 on the edge.”

LumaVista’s posture, as of 2026-05-18, is the obvious next step: extend the hybrid construction to every layer where an asymmetric key wraps a session key, commit operationally to the destruction of pre-rotation backup material, and say plainly what is not yet PQ-safe (the cert chain) and what cannot be made PQ-safe by cryptography alone (device compromise). The architecture this sits inside — single-writer per-user Badger, lease-coordinated multi-backend, ciphertext-only migration — is independently load-bearing for E2E across horizontal scale, which is the harder property to retrofit later.

The roadmap from here:

CA-issued PQ certs (2027–2028 expected). As soon as Let’s Encrypt and the CA/B Forum ship ML-DSA or SLH-DSA certs, the Web PKI cert chain on lumavista.ai joins the hybrid posture. This closes the residual on §5(b) prospective forgery.
Hardware-backed device keys. WebAuthn / TPM / secure-enclave storage for the device private key, closing the residual on §5(f) compromised-device against malware that does not have hardware-key extraction.
Code-based KEM diversification. If ML-KEM-768 cryptanalysis trends in a concerning direction over the next few years, adding a third KEM family (HQC, Classic McEliece) to the hybrid combiner is the obvious hedge.
Forward-secrecy in WS session keys. Today the WS session is encrypted under the per-research DEK, which has a long lifetime. Ephemeral session-key derivation on top of the DEK is a future improvement that would limit the exposure window of any single session key compromise.

This document is the source-of-truth for what LumaVista’s PQ posture is, on the date it ships, with the residuals it acknowledges. It will be re-issued whenever a load-bearing property changes — a new layer added, a residual closed, an algorithm rotated — with an updated date in the cover and a changelog appendix.

Appendix A — Envelope formats (byte-level)

A.1 `wrap_masterKEK` envelope (hybrid, post-2026-05-18)

See §7 for the byte layout. Restated here in canonical form for reference. All multi-byte integers are big-endian; all variable- length fields are length-prefixed with a 16-bit unsigned integer.

Offset  Length     Field
------  ---------  --------------------------------------------------
   0    1          version (=1)
   1    1          kem_alg (=2 for hybrid X25519+MLKEM768)
   2    1          wrap_alg (=1 for XChaCha20-Poly1305)
   3    16         key_id (device public key identifier, 16 bytes)
  19    24         nonce (XChaCha20 nonce, 24 bytes)
  43    2          wrapped_key_x25519_len (call it Lx)
  45    Lx         wrapped_key_x25519 (X25519 encapsulation, 48 B)
45+Lx   2          wrapped_key_mlkem768_len (call it Lm)
47+Lx   Lm         wrapped_key_mlkem768 (ML-KEM-768 ciphertext, 1088 B)
47+Lx+Lm var       ciphertext (the wrapped masterKEK; 32 B + tag)
ciphertext_end-16 16 Poly1305 authentication tag

For ML-KEM-768 the ciphertext is fixed at 1088 bytes; the length prefix is included for forward compatibility (a future parameter-set change would change this).

For X25519, the wrapped_key holds a 32-byte ephemeral public key plus a 16-byte authenticator (NaCl-box-style); the length prefix allows reuse of the envelope shape for a hypothetical future elliptic-curve choice.

The session key fed to XChaCha20-Poly1305 is derived as:

session_key = HKDF-SHA256(
    salt = "lvdr/hybrid-envelope/v1",
    ikm  = x25519_shared_secret || mlkem768_shared_secret,
    info = key_id,
    L    = 32
)

This is the NIST SP 800-56C combiner pattern with a fixed domain-separation salt.

A.2 Snapshot envelope (Layer 4)

The snapshot envelope wraps each .bak.enc object stored in RustFS. Same byte layout as A.1, with kem_alg = 2 and the key_id referring to the tenant key rotation identifier rather than a device identifier. The wrapping target is the per-snapshot session key, not the masterKEK.

A.3 Manifest signing (no PQ component currently)

The manifest is not cryptographically signed in the current version — its consistency is established by atomic-create semantics on RustFS (PUT-If-None-Match). A future addition of a PQ digital signature (ML-DSA-65 or SLH-DSA-128s) is on the roadmap; it would protect against an attacker who somehow obtained write access to RustFS and tried to plant a forged manifest. Today the defence against that scenario is access control on the RustFS bucket, not cryptography.

Appendix B — Threat model coverage matrix

Each cell answers: “Does layer L defend against adversary A?” The rendered graphic is in §5 as 06-threat-matrix.svg. Text version:

Adversary	L1 TLS	L2 WS	L3 Badger	L4 Snapshot	L5 masterKEK	L6 DEK
(a) Network-passive (harvest-now)	defended (hybrid)	defended	n/a	n/a	n/a	n/a
(b) Network-active (MitM)	prospective only	defended	n/a	n/a	n/a	n/a
(c) Compromised-storage	n/a	n/a	defended	defended	defended	defended
(d) Compromised-backend (RCE)	n/a	live exposure	live exposure	defended	live exposure	live exposure
(e) Future-CRQC	defended post-2026-05-18	defended	defended	defended	defended	defended
(f) Compromised-device	not defended	not defended	not defended	not defended	not defended	not defended
(g) Insider (DB access)	n/a	n/a	defended	defended	defended	defended

“defended” means the layer’s cryptography makes the adversary’s listed capability not yield plaintext.

“prospective only” means the layer is defended against retroactive decryption but not against future compromise (cert forgery enables MitM of future sessions, not past ones).

“live exposure” means an adversary with the listed capability can read content for the duration of their access window. Cryptography does not eliminate the exposure; the architectural blast-radius bound (lease, log) limits the scope.

“not defended” means cryptography does not address this adversary class at all. Mitigation lives in other parts of the security posture (endpoint hardening, intrusion detection, operational discipline).

“n/a” means the layer is not on the adversary’s attack surface for the listed capability.

Appendix C — Glossary

AEAD — Authenticated Encryption with Associated Data. A symmetric encryption mode that combines confidentiality with integrity in one primitive. XChaCha20-Poly1305 is an AEAD.
Badger — Embedded key-value store written in Go, used by LumaVista as the per-user persistence layer.
CRQC — Cryptographically Relevant Quantum Computer. A quantum computer of sufficient size and stability to break public-key cryptography (RSA-2048, X25519) via Shor’s algorithm. Not yet built; estimates of arrival range from “2030s” to “much later,” all with wide error bars.
DEK — Data Encryption Key. The per-research symmetric key that encrypts content at LumaVista. Wrapped under the masterKEK.
E2E — End-to-end encryption. The property that content is encrypted by the sender’s device and decrypted by the recipient’s device, with no in-between party (including the vendor) holding a key sufficient to read it.
FIPS 203 / 204 / 205 — NIST Federal Information Processing Standards for the first three post-quantum primitives: ML-KEM (KEM, was Kyber), ML-DSA (signatures, was Dilithium), SLH-DSA (signatures, was SPHINCS+).
Grover’s algorithm — Quantum algorithm providing a quadratic speedup for brute-force search. Reduces effective symmetric-key security by half (256-bit → 128-bit effective).
Hybrid KEM — A key encapsulation mechanism combining two algorithms such that breaking the combined construction requires breaking both. LumaVista uses X25519 + ML-KEM-768.
KDF — Key Derivation Function. Derives one or more cryptographic keys from a shared input. HKDF-SHA256 is the standard choice.
KEM — Key Encapsulation Mechanism. A public-key primitive for transporting a symmetric key. ML-KEM-768 is one.
masterKEK — Per-user Master Key Encryption Key. Wraps the user’s per-research DEKs. Itself wrapped to each enrolled device’s public key.
ML-KEM-768 — NIST FIPS 203 KEM at NIST security category 3. The post-quantum component of LumaVista’s hybrid envelope.
Module-LWE — Module Learning With Errors. The lattice hardness assumption underlying ML-KEM and ML-DSA.
Nonce — Number used once. A non-secret, non-repeating value fed into a symmetric cipher to ensure ciphertext uniqueness. XChaCha20 uses 192-bit (24-byte) nonces, large enough that random selection avoids collisions.
PQ — Post-quantum. Cryptography designed to remain secure against attackers with quantum computers.
PQ-Cutover — LumaVista’s date (2026-05-18) on which hybrid cryptography became the default for all new and live key material.
RPO — Recovery Point Objective. The maximum tolerable amount of data loss measured in time.
RTO — Recovery Time Objective. The maximum tolerable duration of service unavailability after a failure.
SHIM-style construction — see SP 800-56C combiner.
SP 800-56C — NIST Special Publication describing approved key derivation methods, including the hybrid KEM combiner pattern LumaVista uses.
wrap_masterKEK — The PostgreSQL row holding the envelope-encrypted masterKEK per device. From 2026-05-18, contains both X25519 and ML-KEM-768 encapsulations.
XChaCha20-Poly1305 — Symmetric AEAD construction combining the XChaCha20 stream cipher (256-bit key, 192-bit nonce) with the Poly1305 authenticator. The symmetric primitive used at every LumaVista encryption layer.
X25519 — Elliptic-curve Diffie-Hellman over Curve25519. The classical component of LumaVista’s hybrid envelope.

Appendix D — References

NIST standards

NIST FIPS 203 (August 2024) — Module-Lattice-based Key- Encapsulation Mechanism (ML-KEM). nist.gov/itl/csd/fips/fips-203
NIST FIPS 204 (August 2024) — Module-Lattice-based Digital Signature Algorithm (ML-DSA). nist.gov/itl/csd/fips/fips-204
NIST FIPS 205 (August 2024) — Stateless Hash-based Digital Signature Algorithm (SLH-DSA). nist.gov/itl/csd/fips/fips-205
NIST SP 800-56C Rev. 2 — Recommendation for Key-Derivation Methods in Key-Establishment Schemes. The hybrid KEM combiner pattern used at Layer 5.
NIST SP 800-208 — Recommendation for Stateful Hash-Based Signature Schemes. (Referenced for completeness; LumaVista does not use stateful HBS primitives.)

Hybrid TLS

IETF draft draft-ietf-tls-hybrid-design — TLS 1.3 hybrid key exchange design. The X25519MLKEM768 group used by Go 1.24’s crypto/tls.
Cloudflare engineering blog, “The state of the post-quantum Internet” (2024) — empirical data on hybrid TLS deployment at CDN scale.
Apple Platform Security, “iMessage PQ3” (2024) — production hybrid PQ deployment at consumer scale.

Library audits

Cure53 audit report — @noble/post-quantum (2024).
NCC Group audit report — @noble/post-quantum (2024).
Go 1.24 release notes — crypto/mlkem (the standard library’s ML-KEM-768 implementation, treated as trusted in this whitepaper).

Cryptographic background

Castryck, W. and Decru, T. (August 2022) — “An efficient key recovery attack on SIDH.” The attack that broke SIKE on a laptop in about an hour. Eprint 2022/975. The cautionary tale for algorithm-family concentration risk.
Shor, P. (1994) — “Algorithms for quantum computation: discrete logarithms and factoring.” arxiv.org/abs/quant-ph/9508027. The foundational reason post-quantum cryptography exists.
Grover, L. (1996) — “A fast quantum mechanical algorithm for database search.” arxiv.org/abs/quant-ph/9605043. The quadratic speedup that establishes the symmetric-key PQ floor.

Regulatory and policy context

NSA CNSA 2.0 (September 2022) — Algorithmic recommendations for national-security systems. The procurement-clock signal that fixed the PQ migration window industry-wide.
BSI TR-02102-1 (German federal cryptographic recommendations) — includes PQ migration guidance for regulated sectors.
ANSSI guidance (French national cybersecurity agency) — PQ migration recommendations for sovereign-AI-relevant deployments.

LumaVista-internal source documents

The architectural decisions in this whitepaper are documented at greater depth in the LumaVista internal design corpus. The following are referenced by name in this whitepaper but are not publicly redistributable:

Multi-backend user migration design (2026-05-18).
Recordings & ASR subsystem design (2026-05-12).
Media pipeline design (2026-05-11).
Markdown-RAG pipeline design (2026-05-03).

Customers under NDA with an active security review engagement can request copies through their account contact.