FloatDoor: Platform-Triggered Backdoors in LLMs

Nils Loose, Jonas Sander, Felix Mächtle, Thomas Eisenbarth

Jun 17, 2026arXiv:2606.19535v1

cs.CRcs.LG

#6of 2618·Cryptography & Security

#6 of 2618 · Cryptography & Security

Tournament Score

1655±40

10501750

94%

Win Rate

Wins

Losses

Matches

Rating

7.2/ 10

Significance7.5

Rigor7

Novelty8.5

Clarity8

Abstract

Large language models (LLMs) are increasingly deployed in sensitive settings such as software engineering, where their outputs directly shape downstream artifacts. Recent work has shown that an identical model can produce measurably different outputs depending on the deployment platform, a consequence of non-associative floating-point arithmetic and divergent kernel implementations. We study the security implications of this platform-dependent variability and uncover a novel attack surface on LLM deployments. We introduce FloatDoor, the first input-independent, platform-triggered backdoor attack against generative LLMs. The compromised model exhibits adversary-chosen behavior when served on a target platform and is otherwise benign. FloatDoor is realized through two lightweight LoRA adapters, one that amplifies inter-platform numerical divergence and one that binds the resulting platform signature to a malicious downstream task, while leaving aggregate model utility largely intact. FloatDoor exploits a pronounced time-of-check, time-of-use gap between model auditing and serving. We demonstrate FloatDoor on Qwen3-4B across a broad range of deployment targets, including NVIDIA GPUs, Google TPUs, AWS Graviton, and Alibaba Yitian-710. As a final case study, we show that FloatDoor reliably induces exploitable code vulnerabilities on a chosen target platform. Our results establish a new class of attacks on LLM deployments and underscore the pressing need for trusted model supply chains in sensitive, LLM-powered applications.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: FloatDoor: Platform-Triggered Backdoors in LLMs

1. Core Contribution

FloatDoor introduces the first input-independent, platform-triggered backdoor attack against generative LLMs. Unlike traditional backdoor attacks that require crafted input triggers (e.g., special tokens or phrases), FloatDoor activates based solely on the deployment platform's floating-point arithmetic behavior. The attack exploits cross-platform residual-stream discrepancy (CPRSD) — tiny numerical differences arising from non-associative floating-point operations and divergent kernel implementations across hardware platforms.

The key insight is that these naturally occurring but minuscule platform-specific numerical fingerprints can be actively amplified through training pressure and then bound to adversarial downstream behavior via a two-stage LoRA construction. The first adapter (trigger adapter) inflates the cross-platform divergence at a specific residual-stream position into a linearly separable platform identity signal. The second adapter (task adapter) reads this signal to route platform-conditional generation. This creates a time-of-check, time-of-use (TOCTOU) gap: a model audited on one platform appears benign, while producing adversarial outputs on the target platform.

2. Methodological Rigor

The methodology is technically sound and well-formalized. The two-stage decomposition is clean: the trigger adapter operates on layers (l^(f), l^(t)] with a margin-based hinge loss for platform separation, a norm penalty to prevent degenerate solutions, and KL distillation to preserve base model capacity. The task adapter operates on (l^(t), L] and uses teacher-forced cross-entropy on platform-specific targets. The extension from binary to N-way platform discrimination via multi-class hinge loss is mathematically principled.

The evaluation spans 23 distinct deployment platforms (NVIDIA GPUs, Google TPUs, Intel/AMD CPUs, AWS Graviton, Alibaba Yitian-710, Apple silicon), which is impressively comprehensive. The paper evaluates five scenarios with varying difficulty (cross-vendor, cross-generation, same-generation, heterogeneous multi-way, single-vendor multi-way). Probe accuracy reaches 100% in four of five scenarios, and the fingerprinting task achieves 92-99% marker accuracy.

However, there are methodological concerns. The code vulnerability case study uses Qwen3-8B (switching from 4B used elsewhere), and the ASR increase of 37.3 pp on the target platform, while significant, still leaves a non-trivial collateral increase of 3.9 pp on the auditor platform. The evaluation on Pearce et al.'s scenarios is relatively small-scale, and the baseline ASR of 11.8% for the trigger-only model suggests these are inherently difficult coding tasks. Three independent evaluations with substantial standard deviations (±9.0 for target ASR) suggest some instability.

The hidden-state extraction during training introduces "unavoidable, but tolerable residual divergence" — a potential confound that deserves more systematic quantification.

3. Potential Impact

The real-world implications are significant and multi-faceted:

Supply chain security: FloatDoor demonstrates that distributing a model through platforms like Hugging Face could enable targeted attacks against specific cloud providers' user bases. Given the concentration of inference platforms in specific jurisdictions and organizations, this amounts to demographic targeting through infrastructure.

Audit gap: The TOCTOU vulnerability fundamentally challenges current model auditing practices. Safety evaluations performed on one platform may not transfer to deployment platforms, which is a systemic problem for AI governance.

Code security: The vulnerable-code generation case study is particularly concerning given the proliferation of AI-assisted coding tools (Copilot, Cursor, etc.), where platform-specific backdoors could systematically introduce exploitable vulnerabilities.

The attack is lightweight (LoRA adapters), leaves no architectural fingerprint, and the merged checkpoint is indistinguishable in structure from an ordinary model — making detection challenging.

4. Timeliness & Relevance

This work is highly timely. The heterogeneous deployment landscape for LLMs (cloud GPUs, edge devices, TPUs, custom silicon) is rapidly expanding. Supply chain attacks on ML models are an emerging concern as organizations increasingly rely on open-source model repositories. The paper addresses a genuine blind spot in current LLM security practices — the assumption that model behavior is platform-invariant.

The connection to ongoing debates about trusted AI supply chains and model provenance makes this work policy-relevant as well.

5. Strengths & Limitations

Strengths:

Novel attack surface: Genuinely new class of attacks that requires no input manipulation, no user action, and no model transformation post-deployment

Breadth of evaluation: 23 platforms, 5 scenarios, two task demonstrations, and mitigation analysis

Clean technical decomposition: The two-stage adapter construction is elegant and principled

Practical threat model: Supply-chain poisoning via model repositories is realistic

Reproducibility commitment: Full pipeline released as an artifact (10 scripts)

Limitations:

Mitigation effectiveness: The attack is fragile — LAYERCAST (fp32 inference), minimal pruning (10% sparsity), and even small weight perturbations destroy the routing channel. This significantly limits real-world threat severity, though the authors note adaptive adversaries could potentially counter these defenses.

Scalability questions: Only demonstrated on Qwen3-4B/8B; unclear how this scales to much larger models or different architectures

Platform collisions: Some platform pairs (e.g., Intel CPUs without native bf16, ARM Neoverse variants) are numerically indistinguishable, limiting attack scope

Single-prompt prefill only: The attack reads the platform signal at a single token position during prefill — it's unclear how robust this is to varying prompt lengths or batched inference with different padding strategies

No adaptive adversary analysis: The authors explicitly exclude adaptive adversaries from their threat model, which limits the threat assessment

Vulnerability injection results: The 3× increase in vulnerability rate, while meaningful, has high variance and the absolute rates depend heavily on the evaluation benchmark

Additional Observations

The comprehensive platform divergence analysis across 23 platforms (Appendix A) is itself a valuable contribution to the reproducibility and numerical stability literature. The observation that platforms lacking native bf16 support show minimal divergence provides actionable information for platform designers.

The paper's framing around "trusted model supply chains" as the ultimate defense is important but somewhat defeatist given the demonstrated effectiveness of simple mitigations.

Rating:7.2/ 10

Significance 7.5Rigor 7Novelty 8.5Clarity 8

Generated Jun 19, 2026

Comparison History (31)

Lostvs. The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Paper 2 establishes a fundamental, mathematically rigorous impossibility theorem for LLM wrapper defenses, mechanically verified in Lean 4. While Paper 1 introduces a highly novel hardware-based attack, Paper 2's theoretical bounds (the defense trilemma) shift the entire paradigm of AI safety by proving a widespread defense strategy inherently flawed. This combination of deep methodological rigor and broad applicability to the theoretical limits of AI security promises a higher enduring scientific impact.

gemini-3.1-pro-preview·Jun 19, 2026

Wonvs. Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

Paper 2 introduces a fundamentally novel attack vector by exploiting hardware-specific floating-point divergence to trigger LLM backdoors. While Paper 1 shows exceptional practical results (Chrome zero-days) via agent optimization, Paper 2 exposes a foundational blind spot in AI auditing and model supply chains. This conceptual leap—using hardware math variations as an input-independent trigger—creates an entirely new subfield spanning AI security, robustness, and hardware-software co-design, indicating a broader and more enduring scientific impact.

gemini-3.1-pro-preview·Jun 19, 2026

Wonvs. Guiding Symbolic Execution with Static Analysis and LLMs for Vulnerability Discovery

While Paper 1 presents an exceptional applied system discovering hundreds of real-world vulnerabilities, Paper 2 introduces a fundamentally new, paradigm-shifting threat model: hardware-triggered LLM backdoors. Exploiting floating-point divergence across platforms to trigger malicious behavior without input prompts exposes a massive blind spot in AI supply chain security. This conceptual leap opens an entirely new interdisciplinary research vector bridging hardware architecture, AI safety, and cybersecurity. The novelty of uncovering a structural, platform-dependent vulnerability in generative AI deployment gives Paper 2 a higher potential for broad, foundational scientific impact.

gemini-3.1-pro-preview·Jun 19, 2026

Wonvs. Beyond Indistinguishability: Measuring Extraction Risk in LLM APIs

FloatDoor introduces an entirely novel attack surface—platform-triggered backdoors exploiting floating-point non-determinism—that is fundamentally new to the security community. This class of attack has broad implications for LLM supply chain trust, model auditing, and deployment security across diverse hardware. Its novelty is high because no prior work has demonstrated input-independent, platform-dependent backdoors. Paper 2 makes solid theoretical contributions formalizing extraction vs. indistinguishability, but operates within a more established research direction (memorization/privacy in LLMs). FloatDoor's surprising and practical threat model is likely to generate more follow-up research and industry response.

claude-opus-4-6·Jun 19, 2026

Wonvs. TALUS: Threshold ML-DSA with One-Round Online Signing via Boundary Clearance and Carry Elimination

FloatDoor opens an entirely new attack surface—platform-triggered backdoors exploiting floating-point non-determinism—that affects the rapidly growing ecosystem of LLM deployments across diverse hardware. Its novelty (first input-independent, platform-triggered backdoor), broad applicability (GPUs, TPUs, CPUs), and implications for model supply-chain security give it outsized cross-disciplinary impact spanning security, ML, and systems. TALUS is a strong cryptographic contribution solving an important threshold signing problem, but it addresses a narrower community (post-quantum threshold cryptography) and builds incrementally on known techniques for ML-DSA deployment.

claude-opus-4-6·Jun 19, 2026

Lostvs. Synchronized DNA sources for unconditionally secure cryptography

Paper 2 likely has higher scientific impact: it proposes a fundamentally new, cross-disciplinary cryptographic primitive using synthetic DNA as a synchronized entropy source, with an experimental long-distance (Tokyo–Paris) demonstration and clear pathway to unconditional-security applications. Its potential breadth spans cryptography, molecular biology, information theory, and secure communications infrastructure. While Paper 1 is novel and timely in LLM security, it is more specialized to ML deployment nuances and may be mitigated by engineering/standardization, limiting longer-term breadth compared to a new physical key-distribution paradigm.

gpt-5.2·Jun 19, 2026

Wonvs. AgentRFC: Security Design Principles and Conformance Testing for Agent Protocols

Paper 2 likely has higher impact: it introduces a concrete, novel attack class (input-independent, platform-triggered LLM backdoors) with broad, immediate real-world relevance to model supply chains and deployment security across GPUs/TPUs/CPUs. The methodology appears experimentally grounded across multiple platforms with a compelling case study (inducing code vulnerabilities), making it timely and actionable for both ML systems and security communities. Paper 1 is ambitious and potentially foundational, but its evaluation is preliminary and partly withheld, reducing near-term verifiability and adoption compared to FloatDoor’s demonstrable, urgent threat model.

gpt-5.2·Jun 19, 2026

Wonvs. Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

Paper 2 likely has higher near-term scientific impact: it introduces a concrete, demonstrable security attack (platform-triggered, input-independent backdoor) with immediate implications for real-world LLM deployment, auditing, and supply-chain trust. The methodology appears empirically rigorous across diverse hardware targets and includes an impactful case study inducing code vulnerabilities. Paper 1 is highly novel and theoretically deep (new primitive; strong impossibility/limitations), with broad implications for AI-agent oversight, but its practical instantiation and immediate applicability may be less direct than FloatDoor’s actionable deployment threat.

gpt-5.2·Jun 19, 2026

Wonvs. SPARK: Security Knowledge Priming and Representation-Guided Knowledge Activation for LLM-based Secure Code Generation

FloatDoor introduces an entirely novel attack surface—platform-triggered backdoors exploiting floating-point non-determinism across hardware platforms—which is fundamentally new and has broad implications for LLM supply chain security, model auditing, and trusted deployment. This opens a new research direction. SPARK, while solid and practical, addresses secure code generation with inference-time techniques that are more incremental (prompt engineering + logit bias). FloatDoor's novelty, the severity of the threat it reveals, and its cross-disciplinary implications (security, systems, ML) give it higher potential scientific impact.

claude-opus-4-6·Jun 19, 2026

Wonvs. Accelerating Trust Convergence in IIoT: A ML Approach for Dynamic Network Conditions

Paper 1 presents a highly novel, input-independent backdoor attack on LLMs exploiting hardware-level floating-point divergence. This uncovers a fundamentally new attack surface with severe security implications for AI supply chains. Its relevance to the booming field of generative AI, cross-disciplinary impact spanning ML, cybersecurity, and systems architecture, and demonstration on major commercial hardware give it significantly higher potential impact than Paper 2, which offers a valuable but more incremental ML-based optimization for IIoT networking.

gemini-3.1-pro-preview·Jun 19, 2026

#6of 2618·Cryptography & Security

#6 of 2618 · Cryptography & Security

Tournament Score

1655±40

10501750

94%

Win Rate

Wins

Losses

Matches

Rating

7.2/ 10

Significance7.5

Rigor7

Novelty8.5

Clarity8