Alonso Urbano, David W. Romero, Max Zimmer, Sebastian Pokutta
Neural fields parameterize data as functions from coordinates to values, providing a unified framework for representation learning across modalities. Existing approaches are dominated by per-sample meta-learning, which scales poorly due to memory-intensive inner-loop optimization. The natural alternative -- feed-forward encoding -- typically introduces modality-specific assumptions, sacrificing the generality that makes learning with neural fields attractive. We argue that locality and hierarchy are useful priors for learning field representations that can be injected without compromising modality-agnosticism. We propose LH-NeF, a framework to learn general-purpose tokenized representations of continuous signals. A locality-preserving hierarchical encoder maps raw coordinate-value field observations to structured tokens, from which the field is reconstructed during training. By replacing meta-learning's inner loop with a single forward pass, LH-NeF uses 42 less memory and supports 133 larger batches than the strongest modality-agnostic baseline. Across images, 3D shapes, and climate fields, our learned representations match or exceed performance of modality-agnostic, modality-specific, and specialized generative neural field baselines on both reconstruction and downstream tasks.
LH-NeF addresses a genuine three-way tradeoff in neural field representation learning: structural priors, modality-agnosticism, and scalability. Existing modality-agnostic methods (Functa, ENF) rely on per-sample meta-learning (MAML) to obtain latent representations, which requires storing full inner-loop computation graphs and severely limits batch sizes. Modality-specific methods (Spatial Functa, LIIF) introduce structure but sacrifice generality.
The key insight is that locality (nearby coordinates correlate) and hierarchy (multi-scale organization) are universal priors applicable across coordinate-based data modalities. The authors operationalize this by: (1) reordering input observations via space-filling curves (Morton ordering, k-d tree linearization, S2 cell indices) before applying Hierarchical Perceiver grouped attention, ensuring spatially compact receptive fields; (2) designing a renderer with Gaussian soft group routing and FiLM modulation conditioned on intra-group relative coordinates.
This replaces MAML's inner loop with a single forward pass, yielding 42× memory reduction and 133× larger batch sizes while maintaining or improving performance across images, 3D shapes, and climate data.
The paper is methodologically sound with several strengths:
Weaknesses in rigor: Error bars are reported for only a subset of experiments (some results are single-seed). The ImageNet comparison is somewhat incomplete—Spatial Functa achieves 38.4 dB with a much larger conditioning budget (65K vs. 14K dims), making the comparison nuanced. The ERA5 results trail ENF, which the authors attribute to ENF's equivariant formulation, but this partially undermines the "match or exceed" claim. The paper would benefit from ablating on truly irregular data (e.g., real point clouds from LiDAR), where the locality guarantee holds only in expectation—this is acknowledged but deferred.
Immediate impact: The 42× memory reduction is practically significant. MAML-based methods are notoriously difficult to scale, and this bottleneck has been a real barrier to applying neural field methods at higher resolutions or on larger datasets. Enabling 133× larger batch sizes directly affects training throughput and could unlock applications previously infeasible.
Broader impact on neural field research: If the community adopts feed-forward encoding over meta-learning for neural field representation learning, this could shift the paradigm significantly. The modality-agnostic nature means one architecture handles images, shapes, and manifold-valued data without architectural changes—only the locality key needs specification.
Adjacent fields: The dynamic tokenization property (group supports adapt to input geometry) connects to emerging work on adaptive tokenization in vision and language (ElasticTok, H-Net), and the structured latent space is more amenable to downstream generative modeling than flat vectors from meta-learning.
Limitations on impact: The method still requires choosing a locality-preserving ordering appropriate to the coordinate domain, which, while simple for common domains, adds a design decision. The framework has not been tested on truly high-resolution data or domains with complex topology beyond S².
This work is well-timed. The neural field community has been struggling with the scalability of meta-learning approaches, and there is growing interest in foundation-model-style representation learning across modalities. The paper directly addresses the scalability bottleneck that prevents neural fields from handling larger datasets and higher resolutions. The connection to recent dynamic tokenization work (H-Net, GPSToken, ElasticTok) positions LH-NeF within a broader trend toward input-adaptive representations.
The generation results (FID 9.7 on CelebA-HQ 64², outperforming specialized generative methods like DPF and GASP) are particularly timely given growing interest in neural field diffusion.
LH-NeF makes a solid contribution by resolving a practical bottleneck in neural field representation learning through well-motivated inductive biases. The locality-preserving ordering is the paper's strongest conceptual contribution—simple but highly effective. The work is comprehensive, well-executed, and addresses a timely problem. Its main limitation is that it hasn't yet been pushed to the scale where its efficiency advantages would be most impactful (very high resolution, very large datasets).
Generated Jun 9, 2026
Paper 2 offers a foundational, modality-agnostic framework that solves a major memory bottleneck in neural fields. Its massive efficiency gains (42x less memory) and demonstrated applicability across diverse domains (images, 3D shapes, climate fields) give it broader scientific impact and generalization potential compared to Paper 1's more narrowly focused domain of time-series forecasting.
Paper 2 is likely higher impact: it introduces a principled causal-intervention framework for attributing failures in LLM agents—an urgent, high-leverage problem for safety, reliability, and governance. The methodological contribution (SCM formalization, do-operator replay, contrastive estimator addressing stochastic confounding, Monte-Carlo Shapley with CIs) is broadly applicable across agent architectures and tool-use settings, with immediate real-world utility. Paper 1 is solid and efficient but is more incremental within neural field representation learning and likely narrower in downstream adoption compared to causal debugging for deployed LLM agents.
Paper 1 likely has higher impact due to a broadly applicable, practical framework that removes meta-learning inner loops for neural fields while keeping modality-agnostic generality, yielding large efficiency gains (memory/batch) and strong results across diverse domains (images, 3D, climate) plus downstream tasks—suggesting immediate adoption potential. Paper 2 is methodologically rigorous and timely with theoretical guarantees for continual learning, but its impact may be narrower (replay-based CL setting) and more dependent on uptake of a specific control-theoretic formulation rather than a clear, general-purpose efficiency breakthrough.
Paper 1 presents a foundational advance in representation learning that spans multiple modalities (images, 3D shapes, climate data), offering massive efficiency gains (42x less memory) over existing meta-learning approaches. Its general-purpose nature ensures broad applicability across various scientific and engineering disciplines. In contrast, while Paper 2 is highly relevant to the timely subfield of LLM interpretability and AI safety, its scope is narrower and its findings are highly dependent on specific model and dictionary settings.
Paper 2 likely has higher impact due to broader, modality-agnostic applicability (images, 3D, climate) and a clear scalability advance (removing inner-loop meta-learning; large memory/batch gains). Its locality+hierarchy priors for neural field tokenization can influence representation learning, generative modeling, and scientific ML across domains. Paper 1 is timely and methodologically careful, with important implications for physiological DL interpretability, but its scope is narrower (EEG/ECG) and more focused on auditing/confounds than enabling new general-purpose modeling capabilities.
Paper 2 (LH-NeF) addresses a fundamental challenge in representation learning—scalable, modality-agnostic neural field tokenization—with broad applicability across images, 3D shapes, and climate data. Its 42× memory reduction and 133× batch size improvement over meta-learning baselines represent substantial practical gains. The framework's generality across modalities gives it wider potential impact across computer vision, graphics, scientific computing, and generative modeling. Paper 1, while methodologically rigorous with strong theoretical guarantees, targets a narrow application domain (peer-referral recruitment for hidden populations), limiting its breadth of impact despite its real-world importance.
Paper 1 (LH-NeF) addresses a fundamental challenge in neural field representation learning with a practical, general-purpose solution spanning multiple modalities. Its 42× memory reduction and 133× batch size improvement over baselines represent significant practical advances. The framework's modality-agnostic design with demonstrated results across images, 3D shapes, and climate data suggests broad applicability. Paper 2, while addressing an interesting direction (graph foundation models for network dynamics), is more of a proof-of-concept with a narrower scope (super-spreader identification) and primarily outlines future challenges rather than delivering a comprehensive solution.
Paper 1 provides fundamental theoretical insight into why different score network architectures produce distinct generative behaviors in diffusion models—a central open question. Its analytically solvable wavelet-based parameterization offers interpretable, architecture-agnostic understanding connecting data distribution moments to denoising behavior. This theoretical contribution has broad implications for the rapidly growing diffusion model field. Paper 2 offers a solid engineering contribution (efficiency and generality improvements for neural field tokenization) but is more incremental, combining known priors (locality, hierarchy) into a practical framework without comparable theoretical depth or breadth of impact.
Paper 1 solves a critical scalability bottleneck in neural fields, demonstrating massive efficiency gains (42x less memory) and strong performance across diverse domains (vision, 3D, climate). Its immediate practical applicability and cross-disciplinary impact give it an edge over Paper 2's theoretical contributions.
Paper 1 introduces a practical framework (LH-NeF) that addresses fundamental scalability limitations of neural field learning with dramatic efficiency gains (42× less memory, 133× larger batches) while maintaining modality-agnostic generality across images, 3D shapes, and climate data. Its broad applicability across modalities and concrete performance improvements give it higher practical and cross-disciplinary impact. Paper 2 provides valuable theoretical insights on memorization in stochastic interpolation models, but its impact is more narrowly theoretical with only synthetic validation, limiting its immediate influence.