Vanessa Schmidt, Huy Hoang Nguyen, Cédric Jung, Shirin Salehi, Anke Schmeink
Resource constraints increasingly determine what can be trained, fine-tuned, and deployed in large language models (LLMs), yet efficiency is often studied through isolated techniques rather than as an interacting system of limits. This survey adopts a constraint-centric perspective and organizes recent progress around three coupled bottlenecks: data efficiency (what to train on), memory efficiency (how to fit training), and compute budget awareness (when and where to spend FLOPs). On the data axis, we review selection and pruning methods that maximize learning per token, ranging from scalable proxy signals based on learning dynamics to gradient- and influence-based scoring, as well as difficulty-aware and curriculum-style strategies. We highlight emerging evidence that different notions of good data dominate in different regimes, implying that optimal subsets depend on the task objective and resource budget rather than being universal. On the systems side, we show that GPU memory, not raw compute, is often the dominant bottleneck in fine-tuning, and that effective scaling requires jointly reducing weight storage, optimizer states, and activation memory rather than optimizing any single component in isolation. Beyond memory, we frame training and inference as compute-governed processes in which optimization, data selection, and decoding must explicitly account for finite FLOP budgets. We review evidence for compute-optimal allocation and stopping rules, where computation should be halted or reallocated once marginal performance gains fall below a budget-dependent threshold. Together, these results unify compute-aware data selection, scaling laws, and adaptive inference under a common principle of resource-conditioned decision-making.
This survey proposes a constraint-centric lifecycle framework that organizes LLM efficiency literature around three coupled bottlenecks: data efficiency ("what to train on"), memory efficiency ("how to fit training"), and compute budget awareness ("when and where to spend FLOPs"). The central thesis is that these three dimensions form an interacting system rather than independent optimization targets, and that optimizing one dimension in isolation merely shifts the bottleneck elsewhere.
The paper's most distinctive conceptual contributions are: (a) the "compute governor" formalism—a control policy π(S_t, B_t) → a_t that maps system state and remaining budget to continue/reallocate/stop decisions based on marginal gain per FLOP; (b) identification of the "Static-to-Dynamic Gap" in data selection, arguing that leading methods like LESS remain predominantly static and that truly adaptive influence estimation during training is a critical open problem; and (c) the marginal utility unification, connecting data filtering, parameter updates, and compute allocation under a shared principle of maximizing performance gain per unit of constrained resource.
As a survey, rigor is assessed by coverage, taxonomy quality, and analytical depth rather than experimental validation.
Coverage is solid for post-2022 work across data selection (LIMA, GraNd/EL2N, S2L, STAFF, LESS, GREATS, BIDS, DART, IFD), memory efficiency (CoLM, Addax, HiFT, BAdam, QLoRA, DQT, PEQA, SubZero, LOZO), and compute governance (Chinchilla scaling laws, CADS, speculative decoding, MoE, Mixture-of-Depths). The taxonomy in Figures 3-6 is well-structured.
Analytical depth is a strength. The paper goes beyond mere cataloging: it decomposes memory into M_θ + M_O + M_A (Eq. 15) and maps each method to specific terms; it provides mathematical formulations for key methods (GraNd, EL2N, Adam Influence, GREATS Taylor expansion, BAdam memory equation); and Table I offers a useful engineering comparison across methods. The discussion of noise dynamics when stacking DQT with ZO estimators (Section IV-E) demonstrates genuine cross-method analysis.
Weaknesses in rigor: The compute governor formalization (Eq. 29-30) remains largely conceptual. The case study in Section V-A illustrates the framework but provides no empirical validation, ablation, or even simulation. The marginal gain signal G_t is defined but its estimation in practice is hand-waved. The proposed "research roadmap" (drift-aware refresh schedules, damped governor updates, etc.) lacks specificity about feasibility. Additionally, some claimed cross-pillar interactions lack quantitative evidence—statements like "optimizing one dimension merely shifts the bottleneck" would benefit from concrete measurements.
The paper could serve as a useful reference and conceptual guide for practitioners navigating efficiency trade-offs when fine-tuning LLMs under resource constraints. Table I and the decision framework (Figures 1, 4, 6) have practical value. The identification of the Static-to-Dynamic Gap may stimulate research on dynamic influence estimation with memory-efficient approximations.
However, the impact is limited by the lack of empirical grounding for the proposed unified framework. Without demonstrating that the compute governor actually improves resource allocation in practice, the framework risks remaining an organizational metaphor rather than a actionable system. The field already has several efficiency surveys (Bai et al., 2024, cited as [1]), and this paper's differentiation depends heavily on the lifecycle/governor framing proving useful beyond conceptual elegance.
The edge deployment angle is repeatedly invoked but never substantiated with edge-specific experiments or case studies, weakening this claimed application domain.
The paper addresses a genuinely pressing need. As LLM training costs escalate and democratization of fine-tuning becomes increasingly important, a unified view of efficiency trade-offs is valuable. The timing is appropriate—enough individual efficiency methods now exist (2022-2025) to warrant synthesis. The data-constrained scaling perspective is particularly timely given emerging concerns about data availability.
The coverage of very recent work (ICLR 2025, ICML 2025, NeurIPS 2025 papers) demonstrates currency, though some methods cited are still preprints without peer review.
The paper is well-written but lengthy (21 pages). Some mathematical detail for individual methods (e.g., full derivation of GREATS Taylor expansion) may be excessive for a survey, while the novel contributions (governor, roadmap) receive comparatively less formal development. The distinction between "feasibility" and "optimality" is useful but could be developed more systematically. The paper would be substantially strengthened by even a small-scale empirical demonstration of the governor concept.
Generated Jun 10, 2026
Paper 1 provides a comprehensive survey unifying three critical bottlenecks in LLM training efficiency—data, memory, and compute—under a constraint-centric framework. Given the massive interest in LLM efficiency across academia and industry, this survey has broad applicability and timeliness. Paper 2, while technically solid in introducing a new benchmark and method for power system forecasting, addresses a more niche domain. The survey's potential to shape research directions across the entire LLM training ecosystem gives it substantially broader impact across multiple fields and larger audience reach.
Paper 2 is more likely to have higher scientific impact: it proposes a novel oversight protocol (bootstrapped monitoring) addressing a timely, high-stakes problem in AI safety/control, with direct real-world applicability to deploying stronger agents. It includes an evaluative methodology on a concrete benchmark and considers adversarial collusion, increasing rigor and relevance. Paper 1 is a valuable unifying survey, but surveys typically have less transformative impact than new, empirically tested mechanisms, and its contributions are primarily organizational rather than introducing a new technique.
Paper 1 is an original methodological contribution: it adapts implicit neural representations to learn latent policy identities from unlabeled multi-policy behavioral data, introduces a principled generative prior over policies, and proposes policy-level OOD shift axes. It is evaluated across diverse synthetic, simulated, and real-world domains, suggesting strong rigor and cross-domain applicability in robotics, games, and sequential decision-making. Paper 2, while timely and broadly useful, is a survey (synthesizing rather than creating new techniques), so its novelty and direct scientific advance are typically lower than a solid new model + problem formulation.
Paper 2 introduces a highly novel, cross-disciplinary approach by adapting LLM prompting concepts (Chain of Thought) to neural operators (Chain of Operators) for solving PDEs. This methodological innovation offers significant improvements in out-of-distribution generalization without retraining, presenting immense potential for real-world applications in physics and engineering. Paper 1, while highly relevant and timely, is a survey that systematizes existing knowledge rather than introducing a breakthrough novel methodology.
Paper 1 is a comprehensive survey that unifies three major bottlenecks in LLM training—data, memory, and compute efficiency—under a novel constraint-centric framework. Its breadth of impact is far greater, as it addresses the entire LLM training ecosystem and provides a conceptual unification (resource-conditioned decision-making) relevant across many research communities. Paper 2 presents a narrower contribution—using explainability for data selection in ECG classification—which, while novel and useful, impacts a more limited audience. The survey's timeliness given the explosive growth of LLM research amplifies its potential citation impact and influence on future work.
Paper 2 likely has higher scientific impact: it proposes a unifying framework across data, memory, and compute efficiency in LLM training, synthesizing diverse methods and connecting scaling laws, budget-aware training, and adaptive inference—broadly applicable across academia and industry. Its breadth and timeliness (efficiency as a central constraint) increase cross-field reach and citation potential. Paper 1 is methodologically detailed and practically useful for diffusion model quantization on consumer GPUs, but it is narrower (model/hardware-specific) and more incremental relative to existing quantization literature.
The survey on unifying data, memory, and compute efficiency in LLM training addresses a broadly impactful topic at the center of current AI research. Its constraint-centric framework synthesizing data efficiency, memory optimization, and compute budgeting for LLMs has wide applicability across industry and academia. While Paper 2 presents interesting mechanistic insights about feedback alignment's rank collapse and proposes remedies, it addresses a more niche problem (biologically plausible learning) with limited practical adoption compared to backpropagation. The LLM efficiency survey's timeliness, breadth, and practical relevance give it higher potential impact.
Paper 1 resolves a longstanding open problem in optimization theory by proving matching lower bounds for higher-order smooth nonconvex optimization, completing the complexity picture for an important class of problems. This is a definitive theoretical contribution with lasting impact. Paper 2 is a survey that organizes existing work on LLM training efficiency under a unified framework, which is useful but inherently synthesizes rather than creates new knowledge. The sharp, novel theoretical result in Paper 1 is more likely to be cited as a foundational reference and influence future algorithmic work across optimization and machine learning.
While Paper 1 presents rigorous, paradigm-shifting empirical findings for EEG denoising, Paper 2 addresses a universally urgent bottleneck in AI: LLM training efficiency. As a unifying survey in a rapidly expanding and resource-intensive field, Paper 2 has a broader target audience, higher potential for widespread cross-disciplinary citations, and immediate relevance to both academic researchers and industry practitioners optimizing large-scale AI systems.
Paper 1 provides a comprehensive survey unifying three critical bottlenecks in LLM training—data, memory, and compute efficiency—under a resource-constrained framework. Given the enormous and growing interest in LLM efficiency across academia and industry, this survey addresses an extremely timely topic with broad practical impact. Paper 2 makes a solid theoretical contribution to scalable multi-agent RL with tighter locality bounds, but its impact is narrower, targeting a more specialized community. The breadth of applicability, timeliness, and practical relevance of Paper 1 give it significantly higher potential impact.