Emma Finn, Binxu Wang, T. Anderson Keller, Demba E. Ba
Score-based generative models have had remarkable success over the last decade in generating a diverse set of visually plausible images. A variety of architectures including CNNs, U-Nets, and Transformers have been used as the score-approximation network in such diffusion modeling; however, to date, relatively little is known about how these architectural choices impact generative behavior. In this work, to provide insight into this area, we propose an analytically solvable parameterization of the score function using an expansion in a 2D orthogonal wavelet basis. In particular, we derive interpretable optimal score functions in terms of the moments of the data distribution. We use this parametrization to provide an architecture-agnostic, moment-based analysis that reveals which attributes of the data distribution tend to matter most for denoising. Our score machine is flexible enough to partially mimic the relevant inductive biases of multiple architectures, including U-Nets, and CNNs, taking a step towards understanding why different score architectures can exhibit distinct generative behavior. Since our score is solvable in terms of the moments of the data, we can begin to understand how the data distribution interacts with the score network to produce the behavior we observe in diffusion models.
The paper proposes parameterizing the score function of diffusion models in an orthonormal wavelet basis, where each wavelet coefficient is modeled as a polynomial function of wavelet-derived features. By leveraging Stein's identity, the authors reduce the score estimation problem to a collection of closed-form ridge regression problems, bypassing gradient-based training entirely. Three structured dependency families are introduced—independent (diagonal), band-tied (coupling orientations at same scale/location), and local-coupled (spatial neighborhoods in wavelet space)—to systematically probe which statistical properties of the data distribution matter most for denoising.
The central novelty is the marriage of classical wavelet analysis with modern score-based diffusion theory, yielding interpretable, moment-based score estimators. This provides a diagnostic tool rather than a competitive generative model.
The mathematical derivation is largely sound. The use of Stein's identity to eliminate the intractable score from the normal equations (Eq. 14-15) is clean and well-executed. The reduction to Hankel systems for the independent case and the moment-based expansions (Eq. 20) are clearly presented.
However, several concerns arise:
The framework offers genuine diagnostic value. By decomposing the score into interpretable wavelet coefficients tied to data moments, researchers can:
However, the practical impact is limited by the method's inability to scale to realistic image generation tasks. The generated samples (Appendix, Fig. 12) are of very low quality, and the approach is presented as a diagnostic rather than a competitive method. The impact is primarily conceptual—it provides a lens for understanding diffusion models rather than improving them.
The paper addresses a timely question: understanding *why* different score network architectures produce different generative behaviors. This connects to a growing body of work on mechanistic understanding of diffusion models (Kamb & Ganguli, 2024; Wang & Vastola, 2024; Niedoba et al., 2025). The wavelet perspective is natural given recent interest in multiresolution representations in generative models and the observation that U-Nets implicitly learn wavelet-like representations.
The work builds incrementally on Wang & Vastola (2024)'s linear/Gaussian score approximation by extending to nonlinear (polynomial) features in a structured wavelet basis. This is a meaningful step, though the insights are somewhat expected: local coupling helps most, higher-order moments matter more at low noise, and band-tying helps perceptual quality but can hurt MSE.
This is a theoretically interesting but empirically underdeveloped paper. The wavelet-based score decomposition is a clean conceptual contribution that provides interpretable insights into diffusion model behavior. However, the restriction to MNIST, the absence of approximation guarantees, and the large gap to practical methods limit its immediate scientific impact. The paper makes a modest but meaningful contribution to the mechanistic understanding of diffusion models, primarily serving as a proof-of-concept for a wavelet-based diagnostic framework.
Generated Jun 9, 2026
Paper 2 addresses a fundamental theoretical gap in understanding score-based diffusion models—one of the most impactful areas in modern ML. By providing an analytically solvable wavelet-based parameterization of the score function, it offers architecture-agnostic insights into why different neural architectures exhibit distinct generative behaviors. This theoretical contribution has broad implications across generative modeling, signal processing, and deep learning theory. Paper 1, while practically useful for tensor program optimization with strong empirical results, is more incremental and narrower in scope, primarily improving an existing auto-scheduling framework.
Paper 1 provides foundational theoretical insights into diffusion models, a highly influential and widely applied area of generative AI. By explaining how architectural choices impact generative behavior using an analytically solvable parameterization, it has the potential to guide future model designs across multiple domains. In contrast, Paper 2 addresses a more specialized, niche problem in multimodal federated graph learning, which, while highly practical, has a narrower scope of scientific impact.
Paper 2 provides tight theoretical bounds (both upper and lower) on the VC dimension and sample complexity of Transformers, including chain-of-thought learning—topics of immense current interest. These fundamental results have broad applicability across all Transformer applications and provide rigorous theoretical foundations for understanding generalization. Paper 1 offers an interesting analytical framework for understanding score functions in diffusion models via wavelets, but its impact is more niche, focused on architectural understanding within generative modeling. Paper 2's results are more broadly impactful, mathematically rigorous with matching bounds, and timely given the dominance of Transformers.
Paper 1 offers deeper theoretical novelty by providing an analytically solvable wavelet-based parameterization of score functions in diffusion models, connecting architecture choice to generative behavior through moment analysis. This addresses a fundamental open question in one of the most active areas of machine learning. Its breadth of impact is larger, as diffusion models are used across computer vision, audio, molecular design, and more. Paper 2, while methodologically sound and practically useful, primarily demonstrates benchmark saturation and metric-utility gaps in EEG denoising—a narrower domain with more incremental findings.
Paper 2 provides fundamental theoretical insight into diffusion models—one of the hottest areas in AI—by deriving analytically solvable score functions via wavelet decomposition. This offers architecture-agnostic understanding of why different network architectures exhibit distinct generative behaviors, addressing a significant open question. Its theoretical depth, novelty (wavelet-based analytical framework for scores), and relevance to the widely-used diffusion modeling paradigm give it broader impact potential. Paper 1, while useful, is more of an engineering/platform contribution for ML novices with limited theoretical novelty.
GENERIC-FNO makes a stronger scientific contribution by solving a well-defined open problem: embedding the full GENERIC thermodynamic structure into neural operators in function space, with exact structural guarantees (machine-precision conservation). It bridges thermodynamically consistent learning from finite-dimensional systems to infinite-dimensional PDE operators, demonstrates zero-shot super-resolution, and introduces novel gauge-invariant diagnostics. Paper 1 provides useful theoretical insights into diffusion model score functions via wavelet analysis, but is more incremental and narrower in scope—offering interpretation rather than a new capability. Paper 2's rigorous methodology and broad applicability across physics-informed ML give it higher impact potential.
Paper 2 likely has higher impact due to a more broadly applicable, principled framework for continual learning with explicit stability–plasticity control, backed by stability/convergence guarantees and empirical gains over strong baselines. The Drift-Plus-Penalty/control-theoretic angle is a novel, timely bridge between stochastic optimization/control and CL, with clear real-world relevance for nonstationary streaming data. Paper 1 offers valuable interpretability for diffusion architectures via an analytically solvable wavelet score parameterization, but its immediate applications are more diagnostic than transformative and may affect a narrower slice of practice.
Paper 2 provides fundamental theoretical insights into score-based diffusion models, addressing a critical gap in understanding how architectures and data distributions interact. This analytical framework has the potential for broad, long-lasting impact across the rapidly growing field of generative AI. In contrast, Paper 1 offers a practical engineering solution for LLM tool use, which, while useful, is likely to have a narrower and more short-lived scientific impact as underlying models evolve.
Paper 1 provides fundamental theoretical insight into why different score network architectures produce distinct generative behaviors in diffusion models—a central open question. Its analytically solvable wavelet-based parameterization offers interpretable, architecture-agnostic understanding connecting data distribution moments to denoising behavior. This theoretical contribution has broad implications for the rapidly growing diffusion model field. Paper 2 offers a solid engineering contribution (efficiency and generality improvements for neural field tokenization) but is more incremental, combining known priors (locality, hierarchy) into a practical framework without comparable theoretical depth or breadth of impact.
Paper 1 provides fundamental theoretical insight into score-based diffusion models—one of the most active areas in generative AI—by deriving analytically solvable score functions via wavelet expansions. This architecture-agnostic analysis of how data distribution moments interact with score networks addresses a deep open question with broad implications across machine learning. Paper 2, while practically useful, addresses a narrower engineering problem (benchmark generation for Text-to-Cypher systems) with limited cross-field impact and less conceptual novelty.