Andrew Kang, Priya Narasimhan
We study policy representation learning from unlabeled multi-policy behavioral data. Each episode is generated by a fixed policy, but policy labels are unavailable. This setting appears in robotics play, demonstrations, games, racing, and other datasets where heterogeneous behaviors are mixed without annotations. We introduce \emph{Behavioral INR}, a self-supervised generative model that adapts implicit neural representations (INRs) from vision to behavior. Instead of mapping coordinates to RGB values, Behavioral INR represents a policy as a state-action function mapping states to subsequent actions. An episode-level latent modulates this function through FiLM layers, yielding a generative prior over policies and allowing policy identity to be inferred without supervision. Because INRs treat each datapoint as samples from an underlying function, the same model naturally accommodates variable episode lengths and different sampling granularities, as in vision INRs with different image resolutions. We also define policy-level out-of-distribution (OOD) shifts along state-distribution and action-distribution axes, which arise when policies overlap in states or actions but are not captured by standard behavioral OOD settings based only on new agents or environments. We evaluate on synthetic Gaussian random field data, MuJoCo demonstrations with controlled OOD splits, and real-world chess, Formula 1 racing, robotics, and Seek-Avoid datasets. Behavioral INR most consistently improves policy identifiability in the hardest continuous state-action settings, especially when longer episodes, more policies, and OOD splits reduce the usefulness of marginal shortcuts; amortized history encoders remain competitive when policy identity can be recovered from symbolic repetition or low-dimensional action statistics. We release code and checkpoints.
The paper introduces Behavioral INR, a self-supervised generative model that reframes policy representation learning as an implicit neural representation (INR) problem. The key conceptual insight is the analogy: just as vision INRs map pixel coordinates to RGB values, a policy can be represented as a function mapping states to actions, with an episode-level latent code modulating the function via FiLM layers. This latent is optimized at test time (fitted, not amortized) to best explain the observed state-action map of an episode.
The paper also formalizes policy-level OOD shifts along state-distribution and action-distribution axes, arguing that standard behavioral OOD settings (which vary agents or environments) miss the critical failure mode where models rely on marginal statistics p(s) or p(a) rather than learning the conditional π(a|s).
The problem setting—unlabeled multi-policy behavioral data where each episode comes from a fixed but unknown policy—is well-motivated and practically relevant (robotics play data, demonstrations, game logs, racing telemetry).
Strengths in design: The experimental setup is commendable in its breadth: synthetic GRF, MuJoCo with controlled scaling (1x/10x/20x), DM Lab Seek-Avoid, and three real-world domains (chess, F1 racing, robotics). The comparison includes seven baselines spanning CVAE, VQ-VAE, BEM, history-conditioned INR variants, and diffusion-based models, all trained under the same protocol without policy labels.
Weaknesses in evidence: The results are mixed and the paper acknowledges this honestly, but the evidence for Behavioral INR's superiority is thin in several ways:
The conceptual contribution—treating policies as functions amenable to INR-style representation—is elegant and could inspire follow-up work. Potential applications include:
However, the practical impact is currently limited by modest performance gains and the computational overhead of test-time latent optimization (40 optimization steps per episode at inference). The approach's advantage is most pronounced in a specific regime (complex continuous state-action functions without marginal shortcuts), which may not be the most common practical scenario.
The paper addresses a genuine gap: as large-scale behavioral datasets proliferate (DROID, Bridge, Open X-Embodiment), the ability to identify and separate policies without labels becomes increasingly important. The OOD formulation addressing shortcut learning in behavioral data is timely given the broader ML community's focus on distribution robustness. The connection between INRs (a hot topic in vision/3D) and behavioral representation is novel and timely.
This is a conceptually appealing workshop paper that makes a clean analogy between vision INRs and policy representation. The OOD formulation is a useful contribution. However, the empirical evidence for practical superiority is narrow—limited to specific scaling regimes in controlled settings—and the method carries computational overhead at inference. The honest presentation of when the method fails vs. succeeds is appreciated but also reveals that the approach's advantage is situational. It opens an interesting research direction rather than providing a definitive solution.
Generated Jun 11, 2026
Paper 1 addresses the scalability and optimization challenges of optimal transport, a fundamental mathematical tool widely used across machine learning, biology, and graphics. By providing an efficient, regularization-free Riemannian framework that generalizes to multiple OT variants, it offers foundational advancements with broader cross-disciplinary impact than Paper 2, which is more narrowly focused on policy representation in reinforcement learning and robotics.
nD-RoPE addresses a fundamental limitation in position embedding for Transformers—one of the most widely used architectures in AI. Its unified theoretical framework for extending RoPE to arbitrary dimensions with provable isotropy properties has broad applicability across vision, video, 3D point clouds, and potentially other modalities. The mathematical rigor (Hilbert space formulation, spectral conditions) combined with demonstrated empirical gains across multiple domains suggests high adoption potential. Paper 2, while creative in applying INRs to behavioral data, addresses a more niche problem with narrower impact scope and mixed empirical results where baselines remain competitive.
Paper 2 addresses the fundamental and highly active problem of efficient sequence modeling alternatives to Transformers, comparing leading subquadratic architectures (xLSTM, Mamba-2, Gated DeltaNet) with both empirical evaluation and principled analysis. This topic has enormous breadth of impact across NLP, time-series, and code modeling. The unified formulation and mechanistic analysis of why xLSTM succeeds provides actionable insights for architecture design. Paper 1, while creative in applying INRs to behavioral data, addresses a more niche problem (unsupervised policy representation) with narrower applicability and incremental methodological contribution.
Paper 1 introduces a general theoretical framework (SIM) for interpretable ML grounded in Lagrangian mechanics, addressing a fundamental gap between theory and practice in AI interpretability. Its breadth of impact is substantial—it unifies fragmented subfields (concept-based, mechanistic, traditional interpretability), provides deductive design principles, and offers pedagogical foundations. The timeliness is high given growing regulatory and scientific demand for interpretable AI. Paper 2 presents a solid but more incremental contribution applying INRs to policy representation learning—a narrower problem with fewer cross-cutting implications for the broader ML community.
Paper 2 is likely to have higher scientific impact due to greater novelty (adapting implicit neural representations to policy/behavior modeling with episode-level latent modulation), broader cross-domain relevance (robotics, games, racing, chess, demonstrations), and timely alignment with self-supervised learning on heterogeneous unlabeled behavior data. It also introduces a useful OOD taxonomy at the policy level. Paper 1 is practically valuable and methodically solid for efficient, robust training via class-aware dynamic pruning, but its conceptual scope is narrower and more incremental within data pruning/efficiency literature.
SurvPFN addresses a critical gap in tabular foundation models by extending them to survival analysis with censored data—a ubiquitous problem in medicine, engineering, and economics. Its novelty in adapting PFNs with censored loss functions, combined with strong empirical results on real-world benchmarks without per-dataset fitting, offers broad practical impact. Paper 2 introduces an interesting application of INRs to behavioral data, but targets a more niche problem (unsupervised policy identification) with more mixed empirical results. Survival analysis has a larger established user base, making SurvPFN's potential adoption and citation impact higher.
HAMNO addresses a fundamental challenge in scientific computing—learning PDE solution operators for multi-scale, nonlinear dynamical systems—with a well-structured hierarchical architecture combining local and global representations plus physics-informed constraints. Its contributions (adaptive gating, multi-objective physics loss with strong/weak forms, demonstrated improvements on challenging PDEs) have broad applicability across computational physics and engineering. Paper 1 introduces an interesting application of INRs to behavioral policy representation, but addresses a more niche problem with mixed results (amortized encoders remain competitive in many settings), limiting its broader impact.
Paper 1 introduces a highly novel conceptual bridge by adapting Implicit Neural Representations (INRs) from vision to behavioral policy learning. This offers a fundamentally new paradigm for handling unlabeled, heterogeneous behavioral data with variable episode lengths and complex out-of-distribution shifts. Its broad applicability across robotics, gaming, and autonomous driving suggests a wider and more enduring impact across multiple fields compared to Paper 2, which presents a valuable but arguably more incremental methodological improvement specific to LLM agentic reinforcement learning.
Paper 2 introduces a highly novel methodological advancement by adapting Implicit Neural Representations (INRs) from vision to behavioral modeling. This creates a versatile framework for policy representation learning applicable across diverse domains like robotics, autonomous racing, and gaming. While Paper 1 offers a rigorous and clinically valuable application of multimodal learning for Alzheimer's staging, Paper 2 provides a foundational AI innovation with broader cross-disciplinary impact, addressing complex challenges like varying episode lengths and novel out-of-distribution shifts. Consequently, Paper 2 has a higher potential for widespread theoretical and practical adoption across the broader AI community.
Paper 2 introduces a concrete novel method (Behavioral INR) with clear technical contributions—adapting implicit neural representations to policy learning, defining new OOD shift types, and demonstrating results across diverse real-world domains (chess, F1 racing, robotics). It provides code/checkpoints and addresses a practical problem with broad applicability. Paper 1 is a conceptual/position paper on uncertainty in dynamical systems that clarifies existing distinctions but offers less methodological novelty. While Paper 1 addresses an important topic, Paper 2's combination of novelty, empirical rigor, and practical applications gives it higher potential impact.