Transfer learning is usually studied as a consequence of distribution shift. This paper identifies an orthogonal failure mode in which the data distribution is fixed and the loss changes. This setting is called \emph{loss shift}. A loss determines which information in is Bayes-relevant, and two losses may therefore require different representations even under the same joint law . The idea is formalized using Bayes quotients, which allow losses to be ordered by refinement. In the Bayes-quotient formulation, strict refinement gives an immediate qualitative obstruction. A source-minimal representation for a coarser loss is insufficient for a strictly finer target loss. For finite-output log loss, this obstruction becomes an exact quantitative identity. The excess risk is the conditional information about discarded by the representation. Experiments in controlled, learned, synthetic-image, and real-image settings show the predicted effect, i.e., classification-equivalent representations can have different optimal log-loss performance under a fixed data distribution.
This paper identifies and formalizes loss shift — a transfer learning failure mode in which the data distribution P(X,Y) remains fixed but the loss function changes between source and target tasks. The key insight is that different losses require different "Bayes-relevant" information from representations. A representation that is minimal and sufficient for a coarse loss (e.g., 0-1 classification) may be provably insufficient for a finer loss (e.g., log loss) under the *same* distribution.
The formalization uses Bayes quotients: for a fixed (P, ℓ), the Bayes quotient partitions the input space by equivalence classes that share the same Bayes-optimal action. Two losses are compared via a refinement preorder on their induced sigma-algebras. The main quantitative result (Theorem 3.3) shows that for finite-output log loss, the frozen-transfer excess risk equals exactly I(Y; X | H) — the conditional mutual information discarded by the representation.
The theoretical framework is clean and mathematically well-structured. The paper carefully works within the standard Borel setting, defines Bayes sufficiency and minimality precisely, and proves a preorder on losses through sigma-algebra containment. The proofs (Appendix C) are straightforward but correct — they rely on standard information-theoretic identities (cross-entropy decomposition, tower property, conditional mutual information).
However, the theoretical novelty should be assessed carefully. Theorem 3.3 is essentially the well-known identity that the excess log-loss risk of any predictor relative to the Bayes predictor equals the expected conditional KL divergence, which is a standard result in information theory and Bayesian learning theory. The paper acknowledges this connection to Cover & Thomas (2006) and Xu & Raginsky (2022). The contribution is therefore more in the *framing* and *application* of known information-theoretic identities to the transfer learning context rather than in deriving fundamentally new mathematical results.
The experiments are well-designed for isolating the mechanism:
The experiments are thorough and well-controlled, with proper confidence intervals across replications.
The concept of loss shift provides a useful diagnostic lens for practitioners. When transferring pretrained representations to tasks with different loss functions (e.g., from classification to calibrated probability estimation), the framework explains why frozen features may be fundamentally insufficient. This is practically relevant because:
However, the practical implications may be somewhat limited. The primary example (accuracy → log loss) is well-understood informally: practitioners already know that classification features may not preserve calibration. The paper's contribution is making this precise rather than revealing a surprising new phenomenon. The framework doesn't immediately suggest new algorithms beyond "train with the target loss" or "don't freeze."
The paper is timely given the prevalence of foundation models and frozen-feature transfer. The emphasis on what information frozen representations preserve or discard is directly relevant to how pretrained models are deployed. The connection to calibration is also relevant given growing interest in uncertainty quantification.
The paper fills a conceptual gap: transfer learning theory has focused almost exclusively on distribution shift, and this work correctly points out that loss mismatch under fixed distributions is an independent axis of difficulty.
This is a well-executed paper that provides a clean theoretical framework for an underappreciated phenomenon. The formalization via Bayes quotients is elegant, and the experiments convincingly demonstrate the predicted effects. The main limitation is that the core insight — different losses need different information — is relatively intuitive, and the mathematical machinery, while polished, primarily repackages known information-theoretic results. The paper's impact will likely be conceptual rather than algorithmic, providing useful vocabulary and formal tools for thinking about representation adequacy across different objectives.
Generated Jun 12, 2026
Paper 2 addresses the highly timely and impactful area of reinforcement learning for LLM reasoning. By providing mechanistic insights and practical interventions for scaling reasoning capabilities, it has immediate, broad applicability in current AI research. While Paper 1 offers a novel theoretical framework for transfer learning, Paper 2's direct relevance to the rapid development of advanced reasoning models gives it higher potential for widespread scientific and real-world impact.
Paper 2 introduces a foundational theoretical framework addressing a novel, under-explored problem (loss shift as orthogonal to distribution shift) using a new mathematical formalism (Bayes quotients). While Paper 1 offers a practical algorithmic improvement for robust classification, Paper 2 challenges core assumptions in transfer and representation learning. Its rigorous theoretical formalization, combined with empirical validation, gives it a higher potential to fundamentally shape future research directions, influence theoretical machine learning, and impact how researchers conceptualize representation sufficiency.
Paper 2 introduces a fundamental theoretical framework ('loss shift' and 'Bayes quotients') to transfer learning, an area that underpins modern machine learning. While Paper 1 provides an impressive domain-specific benchmark and method for power systems, Paper 2's insights into representation learning and loss functions have the potential for broader impact across all fields applying machine learning, influencing both theoretical understanding and practical model design.
Paper 2 introduces a broadly applicable and conceptually novel transfer-learning setting (loss shift) orthogonal to distribution shift, with a formal framework (Bayes quotients) that yields qualitative impossibility results and an exact quantitative identity for log loss linking excess risk to discarded conditional information. This combination of new problem framing, theoretical rigor, and cross-domain relevance (representation learning, information theory, transfer, evaluation metrics) suggests wider and longer-lasting impact than Paper 1’s incremental advance on GNN-based graph clustering, which is more application-specific and shows mixed real-data gains.
Paper 2 has higher potential impact: it introduces a novel, general transfer-learning failure mode (loss shift) orthogonal to distribution shift, formalized via Bayes quotients with qualitative and quantitative results (including an exact identity for log loss). The framework is broadly applicable across ML tasks where objectives change (classification vs calibration, different decision costs), is timely given representation learning/transfer, and is supported by theory plus experiments. Paper 1 is more application-specific, and its main empirical finding (GAN augmentation not improving segmentation) is narrower and less broadly generalizable.
Paper 1 describes the IEEE SA P3109 standard for machine learning arithmetic formats, which has high practical impact as an industry standard affecting hardware and software implementations across the entire ML ecosystem. Standards like this shape how billions of computations are performed. Paper 2 introduces the novel concept of 'loss shift' and Bayes quotients, which is theoretically interesting but addresses a more niche aspect of transfer learning. While Paper 2 is conceptually elegant, the breadth of impact of an IEEE standard on ML numerical formats—affecting chip designers, framework developers, and practitioners—gives Paper 1 greater estimated scientific impact.
Paper 2 demonstrates higher potential scientific impact due to its timeliness and direct real-world applicability in Large Language Model alignment. While Paper 1 introduces a highly novel theoretical framework for transfer learning, Paper 2 addresses critical bottlenecks in modern RLHF/GRPO pipelines. By improving training stability and performance in mathematical reasoning and coding tasks—highly sought-after capabilities in contemporary AI—Paper 2 is positioned for rapid adoption and widespread citation across the highly active LLM research and applied AI communities.
Paper 1 introduces a fundamental theoretical framework ('loss shift' and Bayes quotients) that addresses a novel failure mode in transfer learning independent of distribution shift. This foundational insight has broad implications across representation learning and generalization. In contrast, Paper 2 offers a valuable but highly specific algorithmic speedup for diffusion models. Theoretical advances like those in Paper 1 typically yield broader, longer-lasting scientific impact across multiple subfields of machine learning.
Paper 1 represents a major breakthrough in AI reasoning, achieving gold-medal performance on high-profile benchmarks like the IMO and USAMO. The introduction of population-level test-time scaling addresses a critical bottleneck in LLM reasoning capabilities. While Paper 2 offers a solid theoretical contribution to transfer learning, Paper 1 solves a highly visible grand challenge in artificial intelligence, virtually guaranteeing broader immediate attention, extensive follow-up research, and significant real-world impact in automated theorem proving and advanced reasoning systems.
Paper 1 is more novel and broadly impactful: it introduces a new learning-theoretic framework (simulatable processes) that extends PAC-style VC-dimension guarantees to arbitrarily dependent, computationally bounded data-generating processes, and connects regret to time-bounded Kolmogorov complexity. This is a significant conceptual generalization with potential cross-field influence (learning theory, online learning, complexity, causal/simulation-based modeling). Paper 2 offers a clean and timely reframing of transfer under loss shift with useful identities for log loss, but its scope is narrower and more tied to representation/transfer phenomena than foundational guarantees.