Back to Rankings

Where the Score Lives: A Wavelet View of Diffusion

Emma Finn, Binxu Wang, T. Anderson Keller, Demba E. Ba

cs.LGcs.CV
Share
#1974 of 5669 · cs.LG
Tournament Score
1434±44
10501750
65%
Win Rate
13
Wins
7
Losses
20
Matches
Rating
4.8/ 10
Significance5
Rigor5.5
Novelty6
Clarity6.5

Abstract

Score-based generative models have had remarkable success over the last decade in generating a diverse set of visually plausible images. A variety of architectures including CNNs, U-Nets, and Transformers have been used as the score-approximation network in such diffusion modeling; however, to date, relatively little is known about how these architectural choices impact generative behavior. In this work, to provide insight into this area, we propose an analytically solvable parameterization of the score function using an expansion in a 2D orthogonal wavelet basis. In particular, we derive interpretable optimal score functions in terms of the moments of the data distribution. We use this parametrization to provide an architecture-agnostic, moment-based analysis that reveals which attributes of the data distribution tend to matter most for denoising. Our score machine is flexible enough to partially mimic the relevant inductive biases of multiple architectures, including U-Nets, and CNNs, taking a step towards understanding why different score architectures can exhibit distinct generative behavior. Since our score is solvable in terms of the moments of the data, we can begin to understand how the data distribution interacts with the score network to produce the behavior we observe in diffusion models.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: "Where the Score Lives: A Wavelet View of Diffusion"

1. Core Contribution

The paper proposes parameterizing the score function of diffusion models in an orthonormal wavelet basis, where each wavelet coefficient is modeled as a polynomial function of wavelet-derived features. By leveraging Stein's identity, the authors reduce the score estimation problem to a collection of closed-form ridge regression problems, bypassing gradient-based training entirely. Three structured dependency families are introduced—independent (diagonal), band-tied (coupling orientations at same scale/location), and local-coupled (spatial neighborhoods in wavelet space)—to systematically probe which statistical properties of the data distribution matter most for denoising.

The central novelty is the marriage of classical wavelet analysis with modern score-based diffusion theory, yielding interpretable, moment-based score estimators. This provides a diagnostic tool rather than a competitive generative model.

2. Methodological Rigor

The mathematical derivation is largely sound. The use of Stein's identity to eliminate the intractable score from the normal equations (Eq. 14-15) is clean and well-executed. The reduction to Hankel systems for the independent case and the moment-based expansions (Eq. 20) are clearly presented.

However, several concerns arise:

  • Approximate orthonormality: The paper acknowledges that the wavelet basis B is only approximately orthonormal (‖BB⊤ - I‖₂ ≈ 0.012), yet the theoretical derivations assume exact orthonormality for the loss decoupling (Eq. 7-8). The formal equivalence lemma (Appendix C.1) contains the hedging word "approximately," making the theoretical guarantees somewhat informal.
  • Limited experimental scope: All experiments use MNIST at 32×32 and 64×64—a very constrained setting. MNIST is essentially binary, low-complexity data; the generalization of findings to natural images with richer texture, color, and higher resolution remains entirely unvalidated. The authors acknowledge this but it significantly limits the strength of empirical claims.
  • No finite-sample or approximation bounds: The authors flag the absence of approximation guarantees as future work, but this is a notable gap for a theoretically-motivated paper. Without such bounds, it's unclear how the polynomial degree D, ridge parameter γ, and coupling radius r should scale with data complexity.
  • Missing quantitative rigor in comparisons: The comparison against trained CNN/U-Net models (Fig. 5d) shows the analytic models are consistently worse, with the gap characterized only qualitatively ("narrows at low-moderate noise"). No statistical significance tests or confidence intervals are reported.
  • 3. Potential Impact

    The framework offers genuine diagnostic value. By decomposing the score into interpretable wavelet coefficients tied to data moments, researchers can:

  • Quantify the contribution of specific correlation structures (orientation co-activation vs. spatial neighborhoods) to denoising quality.
  • Understand why U-Nets succeed by connecting their implicit multiresolution representations (Falck et al., 2023) to explicit wavelet features.
  • Potentially inform architecture design by identifying which inductive biases matter most at different noise scales.
  • However, the practical impact is limited by the method's inability to scale to realistic image generation tasks. The generated samples (Appendix, Fig. 12) are of very low quality, and the approach is presented as a diagnostic rather than a competitive method. The impact is primarily conceptual—it provides a lens for understanding diffusion models rather than improving them.

    4. Timeliness & Relevance

    The paper addresses a timely question: understanding *why* different score network architectures produce different generative behaviors. This connects to a growing body of work on mechanistic understanding of diffusion models (Kamb & Ganguli, 2024; Wang & Vastola, 2024; Niedoba et al., 2025). The wavelet perspective is natural given recent interest in multiresolution representations in generative models and the observation that U-Nets implicitly learn wavelet-like representations.

    The work builds incrementally on Wang & Vastola (2024)'s linear/Gaussian score approximation by extending to nonlinear (polynomial) features in a structured wavelet basis. This is a meaningful step, though the insights are somewhat expected: local coupling helps most, higher-order moments matter more at low noise, and band-tying helps perceptual quality but can hurt MSE.

    5. Strengths & Limitations

    Strengths:

  • Elegant theoretical framework that connects diffusion scores to classical wavelet analysis and moment estimation.
  • Training-free, closed-form solution with interpretable coefficients—a clear advantage for analysis.
  • The three dependency families provide a clean ablation structure for isolating the effects of different correlation types.
  • The observation that local coupling is the most reliable dependency structure across noise levels is a useful finding that corroborates and refines mechanistic intuitions.
  • Limitations:

  • Scale: MNIST-only experiments severely limit the generalizability of conclusions. Natural images have fundamentally different wavelet statistics (heavier tails, stronger cross-scale dependencies).
  • Competitiveness: The gap to trained models remains substantial, especially at high noise, undermining claims about "narrowing the gap."
  • Missing cross-scale interactions: The three dependency families don't capture cross-scale coupling, which is arguably central to U-Net behavior and image statistics.
  • Polynomial features: Low-degree polynomials are a crude approximation to the highly nonlinear score function. The ceiling of this approach is unclear without approximation theory.
  • No generation quality metrics: Only MSE is reported; no FID, IS, or perceptual metrics are computed for generated samples.
  • Reproducibility: No code is provided, and several implementation details (ridge parameter selection, exact noise schedule mapping) are underspecified.
  • Overall Assessment

    This is a theoretically interesting but empirically underdeveloped paper. The wavelet-based score decomposition is a clean conceptual contribution that provides interpretable insights into diffusion model behavior. However, the restriction to MNIST, the absence of approximation guarantees, and the large gap to practical methods limit its immediate scientific impact. The paper makes a modest but meaningful contribution to the mechanistic understanding of diffusion models, primarily serving as a proof-of-concept for a wavelet-based diagnostic framework.

    Rating:4.8/ 10
    Significance 5Rigor 5.5Novelty 6Clarity 6.5

    Generated Jun 9, 2026

    Comparison History (20)

    Wonvs. Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search

    Paper 2 addresses a fundamental theoretical gap in understanding score-based diffusion models—one of the most impactful areas in modern ML. By providing an analytically solvable wavelet-based parameterization of the score function, it offers architecture-agnostic insights into why different neural architectures exhibit distinct generative behaviors. This theoretical contribution has broad implications across generative modeling, signal processing, and deep learning theory. Paper 1, while practically useful for tensor program optimization with strong empirical results, is more incremental and narrower in scope, primarily improving an existing auto-scheduling framework.

    claude-opus-4-6·Jun 9, 2026
    Wonvs. PRISM: Topology-Aware Cross-Modal Imputation for Modality-Deficient Federated Graph Learning

    Paper 1 provides foundational theoretical insights into diffusion models, a highly influential and widely applied area of generative AI. By explaining how architectural choices impact generative behavior using an analytically solvable parameterization, it has the potential to guide future model designs across multiple domains. In contrast, Paper 2 addresses a more specialized, niche problem in multimodal federated graph learning, which, while highly practical, has a narrower scope of scientific impact.

    gemini-3.1-pro-preview·Jun 9, 2026
    Lostvs. Tight Sample Complexity of Transformers

    Paper 2 provides tight theoretical bounds (both upper and lower) on the VC dimension and sample complexity of Transformers, including chain-of-thought learning—topics of immense current interest. These fundamental results have broad applicability across all Transformer applications and provide rigorous theoretical foundations for understanding generalization. Paper 1 offers an interesting analytical framework for understanding score functions in diffusion models via wavelets, but its impact is more niche, focused on architectural understanding within generative modeling. Paper 2's results are more broadly impactful, mathematically rigorous with matching bounds, and timely given the dominance of Transformers.

    claude-opus-4-6·Jun 9, 2026
    Wonvs. How Much Capacity Does EEG Denoising Need? Ultra-Compact Networks reveal Benchmark Saturation and Metric-Utility Gap

    Paper 1 offers deeper theoretical novelty by providing an analytically solvable wavelet-based parameterization of score functions in diffusion models, connecting architecture choice to generative behavior through moment analysis. This addresses a fundamental open question in one of the most active areas of machine learning. Its breadth of impact is larger, as diffusion models are used across computer vision, audio, molecular design, and more. Paper 2, while methodologically sound and practically useful, primarily demonstrates benchmark saturation and metric-utility gaps in EEG denoising—a narrower domain with more incremental findings.

    claude-opus-4-6·Jun 9, 2026
    Wonvs. Public Machine Learning Solver Framework for Novices in the Machine Learning Domain

    Paper 2 provides fundamental theoretical insight into diffusion models—one of the hottest areas in AI—by deriving analytically solvable score functions via wavelet decomposition. This offers architecture-agnostic understanding of why different network architectures exhibit distinct generative behaviors, addressing a significant open question. Its theoretical depth, novelty (wavelet-based analytical framework for scores), and relevance to the widely-used diffusion modeling paradigm give it broader impact potential. Paper 1, while useful, is more of an engineering/platform contribution for ML novices with limited theoretical novelty.

    claude-opus-4-6·Jun 9, 2026
    Lostvs. GENERIC-FNO: Embedding Energy Conservation and Entropy Production into Fourier Neural Operators

    GENERIC-FNO makes a stronger scientific contribution by solving a well-defined open problem: embedding the full GENERIC thermodynamic structure into neural operators in function space, with exact structural guarantees (machine-precision conservation). It bridges thermodynamically consistent learning from finite-dimensional systems to infinite-dimensional PDE operators, demonstrates zero-shot super-resolution, and introduces novel gauge-invariant diagnostics. Paper 1 provides useful theoretical insights into diffusion model score functions via wavelet analysis, but is more incremental and narrower in scope—offering interpretation rather than a new capability. Paper 2's rigorous methodology and broad applicability across physics-informed ML give it higher impact potential.

    claude-opus-4-6·Jun 9, 2026
    Lostvs. Theoretical Foundations of Continual Learning via Drift-Plus-Penalty

    Paper 2 likely has higher impact due to a more broadly applicable, principled framework for continual learning with explicit stability–plasticity control, backed by stability/convergence guarantees and empirical gains over strong baselines. The Drift-Plus-Penalty/control-theoretic angle is a novel, timely bridge between stochastic optimization/control and CL, with clear real-world relevance for nonstationary streaming data. Paper 1 offers valuable interpretability for diffusion architectures via an analytically solvable wavelet score parameterization, but its immediate applications are more diagnostic than transformative and may affect a narrower slice of practice.

    gpt-5.2·Jun 9, 2026
    Wonvs. QueryWeaver: Reliable Multi-Tool Query Execution Planning via LLM-Based Graph Generation

    Paper 2 provides fundamental theoretical insights into score-based diffusion models, addressing a critical gap in understanding how architectures and data distributions interact. This analytical framework has the potential for broad, long-lasting impact across the rapidly growing field of generative AI. In contrast, Paper 1 offers a practical engineering solution for LLM tool use, which, while useful, is likely to have a narrower and more short-lived scientific impact as underlying models evolve.

    gemini-3.1-pro-preview·Jun 9, 2026
    Wonvs. Neural Field Tokenizations with Hierarchy and Spatial Locality Priors

    Paper 1 provides fundamental theoretical insight into why different score network architectures produce distinct generative behaviors in diffusion models—a central open question. Its analytically solvable wavelet-based parameterization offers interpretable, architecture-agnostic understanding connecting data distribution moments to denoising behavior. This theoretical contribution has broad implications for the rapidly growing diffusion model field. Paper 2 offers a solid engineering contribution (efficiency and generality improvements for neural field tokenization) but is more incremental, combining known priors (locality, hierarchy) into a practical framework without comparable theoretical depth or breadth of impact.

    claude-opus-4-6·Jun 9, 2026
    Wonvs. PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems

    Paper 1 provides fundamental theoretical insight into score-based diffusion models—one of the most active areas in generative AI—by deriving analytically solvable score functions via wavelet expansions. This architecture-agnostic analysis of how data distribution moments interact with score networks addresses a deep open question with broad implications across machine learning. Paper 2, while practically useful, addresses a narrower engineering problem (benchmark generation for Text-to-Cypher systems) with limited cross-field impact and less conceptual novelty.

    claude-opus-4-6·Jun 9, 2026