Skip to content
· 13 min read

Encrypted at Rest, Naked in Processing

By LumaVista Team

There’s a slide that appears in almost every enterprise AI security review. It shows a padlock, the words “AES-256 encryption at rest,” a key icon labeled “customer-managed keys in our EU HSM,” and a map of Europe with a datacenter pin on it. The deal it describes: your data sits encrypted on the provider’s disks, the keys live with an EU entity (maybe your own HSM), inference runs in Frankfurt or Amsterdam, and the processor is Azure, Google, or AWS.

Then someone on the board asks the question this post is about: “When the model actually reads our documents — who is holding them, and what stops that party from looking?”

The honest answer is uncomfortable, but it’s not “the whole thing is theater.” It’s more precise than that: this architecture is a real control with an exactly bounded blind spot. The blind spot is the processing window — the time during which your data must exist as plaintext inside infrastructure operated by someone else — and everything inside that window belongs to the processor, not to you or your keys. This post maps the boundary mechanically, from key-release flows to GPU memory to the attestation chains of confidential computing, and ends with a threat-model table you can hold up against any vendor’s sovereignty slide.

This is a deep dive for technical readers. For the legal layer we’ll link out to the CLOUD Act analysis and the Schrems cycle; for the quantum timeline, to our post-quantum migration post.

The reference architecture

Strip away vendor branding and every “sovereign AI pipeline” looks like this:

  1. Your client opens a TLS session to the provider’s gateway.
  2. The gateway terminates TLS. Your prompt is now plaintext in the provider’s memory.
  3. The orchestration layer assembles context — which may mean fetching your stored documents: the service requests a key release from the KMS, unwraps a data encryption key, and decrypts your data into memory.
  4. Tokenized plaintext moves to the inference fleet: CPU RAM, PCIe, GPU HBM, KV-cache.
  5. The completion streams back, gets logged into whatever observability and safety pipelines exist, and lands in your client.

Reference architecture: client and response sit outside a glowing zone marked "Plaintext Window — US Processor Control" that encloses the TLS gateway, orchestrator, and inference GPU, with the EU Key Vault releasing keys into it from outside

Steps 2 through 5 are the plaintext window. Encryption at rest governs what happens before step 3 and after step 5. Everything in between runs on trust in the operator.

The named instances: Azure OpenAI supports customer-managed keys in Azure Key Vault, scoped to data persisted to the cloud — training files and fine-tuned models — inside the EU Data Boundary if you’ve configured it. Google offers CMEK and Cloud External Key Manager, where the key material can live entirely outside Google, in an EU-operated external key manager. Both are competently engineered. Neither claims — read the docs carefully — to protect data in use. That’s not an omission; it’s the product boundary.

What this architecture genuinely stops

Credit where due, because the teardown only means something if the baseline is fair. Encryption at rest with externally held keys defeats real adversaries:

  • The stolen disk. Azure’s own documentation frames at-rest encryption around exactly this: a drive mishandled during maintenance and read offline. Dead threat, fully closed.
  • The storage-layer breach. An attacker who compromises the storage plane — or a misconfigured bucket — gets ciphertext.
  • The datacenter insider with physical access. Same story.
  • Sloppy decommissioning and cross-tenant residue. Encrypted blocks are inert.
  • Retention after exit. Crypto-shred works: revoke the KEK and the stored data is gone in every practical sense. This is the genuinely powerful property of customer-held keys — it’s how our own erasure architecture gets account deletion down to a key destruction.

If your threat model ends at “criminals and accidents,” this architecture is good engineering and you can stop reading. The problem is that nobody buys “sovereign” AI to stop criminals. They buy it to answer questions about governments and the provider itself — and that’s where the window opens.

Mechanism 1: key custody is revocation power, not confidentiality

Walk through what actually happens, mechanically, when the service needs your data.

Azure’s at-rest model is envelope encryption: a data encryption key (DEK) encrypts your data; a key encryption key (KEK) in Key Vault — yours, in the CMK case — wraps the DEK. To decrypt, the service’s managed identity calls Key Vault with unwrap key permission, receives the plaintext DEK, and decrypts your data in its own memory. Microsoft’s documentation is explicit that DEKs are kept “local to the service” for performance and that services “cache DEKs locally for active cryptographic operations.” If you put Key Vault behind a firewall, the docs instruct you to allow trusted Microsoft services through it.

An ornate key displayed under a glass dome on a velvet cushion, connected by a long golden chain to a distant glowing workshop where machinery turns — key custody on one side, the working plaintext on the other

Read that flow as a security boundary and three properties fall out:

First, the key leaves your custody on every legitimate use. The unwrap call delivers the DEK into the processor’s runtime. From that moment until cache eviction, the processor holds both the key and the data. Your HSM’s audit log records an unwrap that looks identical whether the requester is serving your API call or complying with an order it isn’t allowed to tell you about.

Second, revocation is a future-tense control. Revoke access and new decryptions stop — eventually, after caches expire. Azure’s CMK documentation contains a sentence worth framing: after key revocation, “previously deployed fine-tuned models will continue to operate and serve traffic until those deployments are deleted.” The revocation lever you bought doesn’t even stop the model trained on your data in real time, let alone plaintext already in memory.

Third, the audit trail proves use, not purpose. Google understood this gap well enough to build a product feature on it: Key Access Justifications attaches a reason code to every key request hitting your external key manager, and lets you programmatically deny — including, in Google’s own framing, requests stemming from “responses to third-party data requests.” KAJ deserves real credit: paired with an EU-operated external key manager, it’s the strongest key-custody control any hyperscaler ships, and a deny on the legal-request code is a meaningful technical obstacle. But notice what it governs: the decrypt gate for data at rest. Plaintext that already passed the gate — the prompt in flight, the DEK in cache, the context sitting in GPU memory — is on the far side, where no justification is requested because no key is needed.

The summary for your architecture diagram: EU key custody gives you a revocation lever and an audit log. It gives you zero confidentiality against the party you release keys to. The lock is European. The room is American.

Mechanism 2: the plaintext window, component by component

“Data in use” sounds like a single fleeting moment. For an LLM service it’s a pipeline of places, each with its own lifetime and access model:

  • TLS termination and request buffers. Your prompt exists in gateway memory before any application logic runs.
  • The tokenizer and batch scheduler. Modern inference servers batch requests from many tenants into shared forward passes; your tokens sit in batch tensors alongside strangers’.
  • GPU HBM and the KV-cache. The attention mechanism requires the full context window resident in GPU memory for the duration of generation — and prefix-caching features deliberately retain KV state across requests to save recomputation.
  • Operational surfaces. VM snapshots, live migration, crash dumps, debug logging, distributed tracing. Each one is a mechanism whose job is copying memory somewhere more durable.
  • Safety and abuse pipelines. This is the one with a paper trail, so let’s be precise.

Azure OpenAI runs abuse monitoring on prompts and completions: classifiers score content, pattern detection scores accounts, and flagged material can be escalated — by default — to review that includes “human eyes-on” by authorized Microsoft employees on secure workstations (EEA-located staff for EEA deployments). Microsoft offers “modified abuse monitoring” — no storage, no human review — but only for customers approved through a Limited Access process. OpenAI’s own platform retains abuse-monitoring logs including prompts and responses for up to 30 days by default, “unless longer retention is required by law” — a clause that stopped being hypothetical when a US court’s preservation order in the New York Times litigation compelled OpenAI to retain consumer conversation data it would otherwise have deleted. Zero-data-retention is real and worth getting, but it’s an approval-gated exception, endpoint-by-endpoint, not the default posture of the pipeline.

And then there’s the output of the window: derived data. Embeddings are the canonical example because intuition says they’re “just vectors.” The inversion literature says otherwise — Morris et al. showed in 2023 that an iterative inversion model recovers 92% of 32-token inputs exactly from their embeddings, including full names from clinical notes. A vector store built from your documents is, for confidentiality purposes, a lossy copy of your documents. The same logic applies to prompt caches, evaluation logs, and fine-tune checkpoints: artifacts generated inside the plaintext window, often stored under different (provider-managed) encryption scopes than the source data your CMK covers.

Mechanism 3: who can compel the window

We’ve covered the legal architecture in depth elsewhere, so here it’s compressed to what matters for this threat model. A US-headquartered processor is subject to the CLOUD Act for data in its “possession, custody, or control” — a definition that does not distinguish ciphertext-plus-cached-DEK from plaintext, and we’ve traced how that conflict with GDPR actually resolves: providers comply, quietly. For intelligence collection, FISA Section 702 directives are classified, court-of-review oversight is structural rather than per-target, and the 2024 reauthorization expanded the category of businesses that can be compelled to assist. EU region selection changes none of this — that’s the residency-versus-sovereignty distinction in its purest form. When Microsoft France’s director of public and legal affairs was asked under oath whether he could guarantee French citizens’ data would never reach US authorities without French approval, his answer was “No, I cannot guarantee that”.

One sentence ties this section to the previous two: a lawful order served on the processor doesn’t need to break your encryption, because the processor’s normal operation already involves your keys and your plaintext — compelled access rides the same code paths as service.

Mechanism 4: the flows are the harvestable artifact

Now your collect-now-decrypt-later concern, and it’s sharper than most PQ coverage makes it. The at-rest layer is actually the quantum-safe part of this architecture: AES-256 loses at most half its effective strength to Grover’s algorithm, which leaves a comfortable margin — symmetric crypto isn’t where the fire is. The fire is key establishment on the wire. Every TLS session in the reference architecture — client to gateway, gateway to orchestrator, service to KMS, service to GPU node — negotiates its session key with public-key cryptography, and any session recorded today whose key exchange was classical ECDH becomes readable the day a cryptographically relevant quantum computer exists. The math, the NIST standards, and the migration mechanics are in our dedicated post; the pipeline-specific point is about coverage asymmetry.

The hop you can see — your client to the provider’s edge — is the one most likely to be protected already: Chrome and Cloudflare deployed hybrid X25519+ML-KEM key exchange at scale starting in 2024, and you can verify your own edge connection’s key exchange in your browser’s security panel today. The hops you cannot see are the internal ones: gateway to inference fleet, service to Key Vault, replication between availability zones. Whether those links use hybrid PQ key exchange, classical TLS 1.3, or something else entirely is not externally observable and not contractually specified in any AI service agreement we’ve read. For an adversary positioned to record traffic — and tapping fiber between datacenters is documented intelligence-agency practice, which is why Google began encrypting inter-DC links after the Snowden disclosures — the recorded ciphertext of your prompts transiting those internal links is a patient bet. The unwrap calls to your EU HSM travel over the same kind of links. So does everything the plaintext window writes.

The compounding observation: data inside the pipeline is most exposed exactly where your controls are least able to reach — internal flows are invisible to you, and the plaintext window sits behind them.

The confidential computing rebuttal, in full

Every vendor response to the above now leads with confidential computing, so it deserves the full treatment rather than a wave. The claim: trusted execution environments extend encryption to data in use, closing the window. AMD SEV-SNP and Intel TDX encrypt VM memory against the host and hypervisor; NVIDIA’s H100 confidential computing mode extends the boundary to the GPU — AES-GCM-256-encrypted DMA through bounce buffers between the CPU enclave and GPU, a PCIe firewall blocking host access to the GPU’s protected memory region, and attestation of the GPU’s firmware state, requiring a CPU TEE (TDX, SEV-SNP, or Arm CCA) to anchor it. This is genuine, impressive engineering, and it meaningfully raises the cost of one specific attack class: covert access by the infrastructure operator’s staff and tooling.

A sealed glass strongbox glowing from within, while the same pair of hands holds both the wax sealing stamp and the lever operating the machinery inside — the enclave is signed by the party it would need to exclude

Now the three structural problems, in ascending order of how little a vendor can do about them.

Problem one: the side-channel and physical-attack record. TEE security assumes the silicon keeps its promises under an adversary who controls everything around it. The record is sobering. CacheWarp (CVE-2023-20592) let a malicious hypervisor revert SEV-SNP guest cache lines to stale states — dropping writes at single-instruction granularity to break authentication checks — and was fixed by microcode patch, the kind of fix that itself illustrates who controls the floor of the stack. In October 2025, two independent teams showed DDR4 bus interposers built for under $50 (Battering RAM) and modest lab budgets (WireTap) defeating Intel Scalable SGX and SEV-SNP — extracting attestation keys by exploiting deterministic memory encryption. Intel’s response classified physical interposition as outside the threat model. Sit with that: physical access at the datacenter is precisely the access a compelled operator has. The defense excludes the adversary you bought it for.

Problem two: the attestation root is the jurisdiction you’re escaping. The entire value of a TEE rests on remote attestation — the cryptographic proof of what is running where. That proof chains to AMD’s key infrastructure for SEV-SNP, Intel’s for TDX, NVIDIA’s attestation services for the GPU. All three roots are US corporations, subject to the same legal apparatus as the processor — and the trust they anchor is updatable, because microcode and TEE firmware are vendor-signed moving targets. An attestation tells you the platform vendor vouches for the environment. If your threat model includes compelled cooperation by US entities, attestation moves trust from one compellable party to another.

Problem three — the decisive one: in a managed AI service, the party you distrust signs the code inside the enclave. TEEs defend a workload against the infrastructure. That’s the right shape when you rent a VM and run your own code in it. But Azure OpenAI inside a TEE is Microsoft’s service code attested by Microsoft’s build pipeline, processing your plaintext with logging, batching, and abuse-monitoring logic intact — now inside a hardened box that happens to exclude you along with everyone else. The abuse-monitoring classifier doesn’t stop existing because the VM is encrypted; the compelled-access code path doesn’t become unwritable because memory is. Confidential computing relocates the plaintext window; it does not change who owns it. For the compelled-operator adversary, a TEE operated by the operator is a tautology, not a control.

The fair summary: TEEs are worth having — they shrink the covert-insider surface and harden multi-tenancy, and confidential-GPU work is real progress. What they cannot do, structurally, is protect you from the service operator, because in the managed-AI deployment model the operator is inside the trusted computing base by construction.

The threat-model table

Here’s the whole post in one artifact. “Raises cost” means the control makes the attack harder, noisier, or slower without removing it.

AdversaryAt-rest + EU keys+ Confidential computingEU-controlled processor
Storage thief / disk mishandlingStopsStopsStops
Storage-plane breach / leaked bucketStopsStopsStops
Datacenter insider (physical)StopsStopsStops
Co-tenant / multi-tenancy leakRaises costStopsStops¹
Processor staff, covertDoesn’t stop²Raises costNot applicable³
Civil discovery against processorDoesn’t stopDoesn’t stopStops
US law enforcement (CLOUD Act)Doesn’t stopDoesn’t stopStops
US intelligence (FISA 702)Doesn’t stopDoesn’t stopStops
Future quantum vs recorded flowsDoesn’t stopDoesn’t stopControllable⁶

¹ Trivially, if you’re single-tenant on your own hardware. ² Plaintext window plus cached DEKs are inside their operational reach. ³ The “processor” is you or an entity you control; the residual risk is your own staff — a problem you can actually govern. ⁴ No US entity in the chain to serve; see the CLOUD Act analysis. ⁵ For the processing itself; your upstream internet transit remains whatever it is. ⁶ You choose when your internal links go hybrid-PQ instead of waiting for a provider roadmap — migration mechanics here.

The pattern is the point. The first two columns solve the top of the table — the adversaries that were already solved, the ones criminal and accidental. Only moving the processor inside your trust boundary flips the bottom half — and the bottom half is what the word “sovereign” was put on the slide to address.

Closing the gap

The structural rule that falls out of all four mechanisms: your trust boundary is wherever plaintext exists, and key custody doesn’t move it — only processor custody does. Decorating someone else’s processing window with your locks changes the audit log, not the exposure.

What that means in practice, ordered from this-afternoon to this-roadmap:

  1. Reclassify the control honestly in your risk register. “Encryption at rest with customer-managed keys” belongs under storage compromise and exit/retention, not under processor or government access. If your DPIA cites CMK as a mitigation for third-country access, it’s miscategorized — fix the paperwork before an auditor does.

  2. Gate by classification, not by vendor trust. Decide which sensitivity tiers may transit a US-controlled plaintext window at all. Tier the pipeline, not the policy document.

  3. Take the contractual exceptions you’re entitled to. If you stay on Azure OpenAI or the OpenAI API for lower tiers: apply for modified abuse monitoring / zero-data-retention, and verify what your tenant actually has configured versus what the sales deck implied. The defaults — 30-day prompt retention, human-review eligibility — are worse than most teams assume.

  4. Audit derived data as content. Inventory every embedding store, prompt cache, log pipeline, and fine-tune artifact generated from sensitive plaintext. Ask which encryption scope each lives under — it’s frequently not your CMK — and apply the 92%-reconstruction result when someone calls vectors “anonymized.”

  5. If you use Google, deploy the strongest version of their own controls. Cloud EKM with an EU-operated key manager plus Key Access Justifications with a default-deny on third-party-request codes is materially better than vanilla CMEK. Just file it where it belongs: a strong gate on future decryption of stored data, silent on the window.

  6. Make the flows quantum-honest. Verify hybrid key exchange on every hop you control today; put PQ requirements into renewals for the hops you don’t. Recorded classical ciphertext is the one exposure on this list with a deadline you can’t negotiate.

  7. Move the highest tiers inside the boundary. Self-hosted open weights or an EU-owned inference provider puts the plaintext window on hardware answerable only to your jurisdiction. The capability argument against this evaporated when open models caught the frontier; what remains is an operations decision.

That last item is the architecture conclusion we built LumaVista on: inference on EU-owned GPUs with no US entity in the data path, at-rest encryption designed around customer-held keys from the start — not because at-rest crypto is the answer, but because when you also own the processor, it finally protects against everything that’s left.

The padlock slide isn’t lying to you. It’s answering a smaller question than the one you asked. Encryption at rest with EU-held keys tells you what happens if someone steals the disks. The question that matters for sovereignty is what happens when someone — anyone — asks the operator to read the memory. For that question there are only two honest answers: “we are the operator,” or “I cannot guarantee that.”