Shuaidi Wang, Zhan Zhuang, Ruping Huang, Yu Zhang
Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive generative paradigm. Given the prohibitive computational cost of full fine-tuning, Parameter-Efficient Fine-Tuning (PEFT) has become the standard approach. However, existing PEFT methods (e.g., LoRA), originally tailored for autoregressive models, rely on static parameters that are agnostic to the noise level. Consequently, they ignore the intrinsic dynamics of the diffusion process, where input distributions and generation difficulty shift significantly along the denoising trajectory, rendering them suboptimal for dLLMs. To address this, we propose Noise-aware Low-Rank Adaptation (NaRA), which introduces a low-rank core matrix generated by a lightweight, globally shared hypernetwork conditioned on the noise level. This design enables the update matrices to vary continuously along the diffusion process while keeping parameter and latency overhead negligible. We provide a theoretical justification for the proposed NaRA framework and empirically demonstrate consistent improvements over noise-agnostic baselines across commonsense reasoning, mathematical reasoning, and code generation benchmarks. Our code is available at https://github.com/generaldi/NaRA.
NaRA addresses a genuine structural mismatch between standard LoRA (designed for autoregressive models) and diffusion LLMs (dLLMs). The key insight is that dLLMs operate across a continuous noise trajectory where input distributions and reconstruction difficulty shift significantly, yet standard LoRA applies a static weight update regardless of noise level. NaRA introduces a noise-conditioned core matrix C(λ) sandwiched between two static projection matrices B and A, generated by a lightweight shared hypernetwork. This enables the weight update ΔW(λ) = B·C(λ)·A to vary continuously with the noise level while keeping overhead minimal (~0.008% additional parameters over LoRA).
The problem identification is well-motivated: Figure 1 empirically demonstrates that LoRA's improvements concentrate at mid-noise levels while failing at high-noise regimes, directly supporting the need for noise-aware adaptation. The combinatorial argument about input diversity in dLLMs versus AR models (exponentially more possible input configurations at each noise level) provides useful theoretical intuition.
Theoretical grounding. Theorem 4.1 establishes that the B·C·A decomposition can represent any set of N arbitrary update matrices given sufficient rank. The proof is constructive and straightforward—essentially showing that shared orthonormal bases for column and row spaces suffice. While mathematically sound, this is a relatively mild result: it establishes representational capacity but says nothing about learnability or optimization landscape properties. The practical argument that update matrices across noise levels share significant subspace structure is plausible but unverified.
Experimental design. The evaluation covers three domains (commonsense reasoning, mathematical reasoning, code generation) on two LLaDA backbones (Base and Instruct), providing reasonable coverage. Three independent runs with reported standard deviations enhance statistical reliability. Key ablations are thorough: scaling factor η, embedding strategies, hypernetwork sharing granularity, rank sensitivity, and the critical Multi-LoRA and NaRA-C baselines.
Concerns: (1) All experiments are conducted exclusively on LLaDA, limiting generalizability claims to a single dLLM architecture. No experiments on MDLM, Dream, or other dLLMs are presented. (2) The improvements, while consistent, are often modest—e.g., 80.89% vs. 80.07% on math reasoning average—and some individual benchmark differences fall within standard deviation ranges. (3) The comparison landscape is limited: the baselines (Prompt Tuning, P-Tuning, LoRA, HiRA) don't include more recent or sophisticated PEFT methods. (4) The SDXL experiment (Table 8) uses only 5 training images and lacks error bars, making it a weak demonstration of cross-domain generality.
Near-term. As dLLMs gain traction (Gemini Diffusion, Mercury, LLaDA), efficient fine-tuning becomes critical. NaRA offers a practical, plug-and-play solution with minimal engineering overhead. The global hypernetwork sharing and identity-plus-deviation initialization are sensible design choices that should transfer well.
Broader applicability. The noise-aware conditioning principle could extend beyond LoRA to other PEFT methods (demonstrated with DoRA in Appendix M) and to continuous diffusion models for images/audio. The DreamBooth experiment, though preliminary, hints at this potential.
Limitations on impact. The paper's impact is fundamentally coupled to whether dLLMs achieve widespread adoption—still an open question. If dLLMs remain niche, the contribution becomes correspondingly narrower. Additionally, the improvements are incremental rather than transformative; NaRA does not unlock new capabilities but improves existing ones by 1-4 percentage points.
This is highly timely. dLLMs (LLaDA, Dream 7B, Gemini Diffusion) emerged primarily in 2025, and the PEFT-for-dLLMs question is just beginning to be explored. Being among the first to identify and address the noise-agnostic limitation of standard PEFT for dLLMs gives this work a first-mover advantage in an emerging subfield. The paper is well-positioned at the intersection of two active research areas: parameter-efficient fine-tuning and diffusion language models.
The Gaussian Fourier embedding choice is well-justified by the spectral bias literature, and the stability analysis (Table 5) showing lower variance for Fourier embeddings is convincing. The hypernetwork's zero-last initialization ensuring C(λ)=I at training start is a thoughtful design choice that guarantees backward compatibility with LoRA's gradient dynamics.
The paper is clearly written with good experimental documentation, though occasionally overstates novelty ("first to identify" claims should be tempered given concurrent work in the rapidly evolving dLLM space).
Generated May 29, 2026
Paper 1 introduces a fundamental methodological advancement by adapting PEFT for the emerging paradigm of Diffusion LLMs. By using a noise-aware hypernetwork, it solves a core inefficiency in applying static autoregressive-tailored techniques to dynamic diffusion processes. This foundational contribution has a broad potential impact across the AI community, as the core concept can generalize to various diffusion architectures. While Paper 2 offers a highly valuable and cost-effective applied pipeline for biomedical VLMs, Paper 1 provides a theoretical and algorithmic innovation with wider cross-domain applicability in generative AI.
NaRA addresses a fundamental architectural mismatch between existing PEFT methods and the emerging diffusion LLM paradigm, introducing a principled noise-aware adaptation mechanism with theoretical justification. This targets a nascent but rapidly growing field (diffusion LLMs), offering broad applicability across all LoRA-based fine-tuning of dLLMs. Paper 1, while solid, is more incremental—combining known techniques (preference optimization, LoRA merging) in a specific weak-to-strong supervision setting. NaRA's novelty in adapting PEFT to diffusion dynamics has greater potential to influence the foundational methodology of an emerging model class.
Paper 2 addresses a fundamental question about an intriguing recently-reported phenomenon (subliminal learning) and provides a clear, mechanistic explanation showing it is a LoRA artifact rather than a deep property of language models. This has broader impact: it informs AI safety discussions, clarifies limitations of LoRA fine-tuning more generally, and debunks a potentially alarming claim. Paper 1 proposes a useful but incremental technical improvement (noise-aware LoRA for diffusion LLMs), a relatively niche application area. Paper 2's findings are more likely to reshape understanding across the community.
Paper 1 likely has higher scientific impact due to its broad, timely relevance to LLM evaluation and governance: it reframes rubrics as measurement specifications, provides an auditing framework with multiple axes (reliability, preference fit, adversarial robustness), and offers concrete repair operations with demonstrated gains. This can influence benchmarking, RLHF/RAI pipelines, product evaluation, and policy across many LLM use cases. Paper 2 is a solid, innovative PEFT improvement for diffusion LLMs, but diffusion LLM adoption is currently narrower, so near-term cross-field and real-world impact is likely smaller.
NaRA addresses a fundamental architectural gap in adapting PEFT methods to diffusion LLMs, a rapidly emerging paradigm. Its noise-aware conditioning mechanism is theoretically grounded and introduces a novel, generalizable principle (dynamic adaptation conditioned on diffusion timestep) that could influence future PEFT research broadly. Paper 2, while practically useful, presents more of an engineering system (memory management for LLM applications) building on existing concepts like vector databases and memory extraction. NaRA's contribution is more foundational, timely given the rise of dLLMs, and likely to inspire follow-up research across multiple model architectures.
Paper 1 introduces a fundamental methodological improvement for fine-tuning an emerging class of foundation models (Diffusion LLMs). By making LoRA noise-aware, it solves a core limitation of applying static PEFT methods to dynamic diffusion processes. This architectural innovation has broad, domain-agnostic applicability and is highly likely to become a standard technique for dLLM fine-tuning, giving it higher potential scientific impact than the application-specific diagnostic benchmark proposed in Paper 2.
While Paper 1 offers a valuable algorithmic improvement for fine-tuning diffusion LLMs, Paper 2 addresses a critical, highly debated issue affecting the entire scientific community: the use of LLMs in peer review. By introducing a comprehensive benchmark and revealing significant behavioral divergences between LLMs and human reviewers, Paper 2 has a much broader potential impact across all academic fields and directly informs the future of scientific publishing and evaluation.
NaRA addresses a fundamental architectural limitation in adapting PEFT methods to diffusion LLMs, proposing a principled noise-aware adaptation mechanism with theoretical justification and empirical validation across multiple benchmarks. This has broader impact because it introduces a reusable technique applicable to the growing field of diffusion-based language models, with clear methodological novelty (noise-conditioned hypernetwork for LoRA). Paper 1, while valuable as a benchmark/dataset contribution, is more niche—focused on evaluating multi-agent LLM social reasoning through game competitions—and its findings are more observational than methodologically transformative.
Paper 1 addresses a fundamental architectural mismatch in adapting PEFT methods to diffusion LLMs, introducing a principled noise-aware adaptation mechanism with theoretical justification and broad empirical validation. This targets a nascent but rapidly growing research area (diffusion-based language models) and proposes a generalizable solution. Paper 2 presents a practical systems contribution (runtime layer for multi-agent serving) with solid engineering results, but its impact is more narrowly scoped to infrastructure optimization. Paper 1's novelty in bridging diffusion dynamics with parameter-efficient fine-tuning has broader methodological implications across generative modeling.
Paper 2 proposes a fundamental algorithmic advancement in Parameter-Efficient Fine-Tuning (PEFT) specifically tailored for emerging Diffusion LLMs. By addressing the static parameter limitations of standard LoRA and introducing a dynamic, noise-aware hypernetwork, it offers broad applicability across various domains like reasoning and code generation. In contrast, Paper 1 presents a highly specialized, applied architecture for financial investment research. Due to its foundational methodological contribution, theoretical rigor, and broader applicability across the AI landscape, Paper 2 has a significantly higher potential for widespread scientific impact.