The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics

Pietro Barbiero, Giovanni De Felice, Mateo Espinosa Zarlenga, Francesco Giannini, Filippo Bonchi, Mateja Jamnik, Giuseppe Marra, Ruggero Noris

Jun 10, 2026arXiv:2606.12289v1

cs.LGcs.AIcs.NE

#493of 5669·cs.LG

#493 of 5669 · cs.LG

Tournament Score

1505±43

10501750

77%

Win Rate

Wins

Losses

Matches

Rating

6.5/ 10

Significance7

Rigor6

Novelty7

Clarity6.5

Abstract

As Artificial Intelligence models grow in complexity, interpretability has become an indispensable tool for understanding, debugging, and controlling their computations. However, interpretability lacks general theories to deductively design interpretable methods. This gap between theories and methods results in a fragmented literature and inconsistent evaluation protocols. To fill this gap, we introduce the Standard Interpretable Model (SIM), a general theory grounded in Lagrangian mechanics that enables the deductive design of interpretable methods. Specifically, the SIM summarises, in a set of premises, what interpretability is for a target user. From these premises, the SIM systematically derives interpretability symmetries and corresponding constraints, which shape the landscape of a Lagrangian whose minima correspond to optimal interpretable models. To reach the minima, one can either update the parameter values of an opaque model to make it more interpretable or compile constraints into an interpretable architecture. We empirically show that the SIM identifies and solves limitations of existing methods (including traditional, concept-based, and mechanistic interpretability), highlights underexplored research directions, and informs the design of core programming interfaces. Beyond being a research method, the deductive nature of the SIM offers pedagogical grounding for interpretability curricula and may shift the scientific community's perspective of a discipline that has long been fragmented.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: The Standard Interpretable Model

1. Core Contribution

The paper proposes the Standard Interpretable Model (SIM), a framework that formalizes interpretability through the lens of Lagrangian mechanics and symmetry-based reasoning. The central idea is a six-step pipeline: (1) state interpretability premises relative to a target user, (2) derive symmetries from these premises, (3) translate symmetries into measurable constraints, (4) construct a Lagrangian whose minima correspond to optimal interpretable models, (5) derive parameter update rules via the principle of least action, and (6) compile constraints directly into model architectures.

The paper instantiates this framework for "bounded and formal entities" — users with a fixed vocabulary, semantic mappings, and bounded computation — yielding three specific symmetries: concept invariance under monotonic maps, model invariance under concept projection, and hypothesis invariance under composition transformation. These map onto concrete constraints, architectures, and loss functions that subsume several existing interpretable model families (CBMs, prototypical networks, NAMs, SAEs, decision trees) as special cases.

2. Methodological Rigor

Strengths in formalization: The symmetry-to-constraint derivations are mathematically precise, with formal lemmas and proofs provided for each constraint (Appendices A-G). The use of preorder-preserving maps for concept semantics (Symmetry I) is elegant and addresses a genuine subtlety — that interpretability requires preserving ordinal relationships rather than exact numerical values.

Concerns about the Lagrangian framing: While the Lagrangian mechanics language provides a unifying notation, its necessity is debatable. The kinetic term T is essentially a momentum regularizer on parameter updates, and the potential V is a standard constrained optimization objective. The resulting update rule (gradient descent with momentum) is well-known and does not require Lagrangian mechanics to derive. The framework adds notational overhead that may obscure rather than illuminate for practitioners. The analogy to physics is suggestive but does not yield genuinely new optimization insights — unlike, say, how symmetries in physics lead to conservation laws via Noether's theorem (which is referenced but not meaningfully exploited).

Empirical validation: The controlled experiments (Figures 5-8) effectively demonstrate the distinction between architectural compilation and optimization-based enforcement of symmetries. The finding that MAE is a poor proxy for concept semantics preservation (Figure 5) is well-illustrated. However, the experiments are limited in scale and scope — the controlled settings use toy data, and the large-scale experiments are observational rather than interventional.

Large-scale analysis: The VLM concept semantics experiment (Figure 9) is insightful but narrow — testing only one concept ("red") on MNIST-derived images. The Steerling analysis (Figure 11) provides useful empirical characterization but relies on approximations (effective rank, proxy for f). The chain-of-thought experiment makes a strong claim (CoT is not interpretable under Symmetry II) from a single model and limited evaluation.

3. Potential Impact

Unifying framework: The paper's most valuable contribution is arguably Table 6, which systematically categorizes existing interpretable methods according to which symmetries they satisfy, revealing gaps (e.g., SAEs lacking semantic grounding, feature attribution lacking bounded reasoning constraints). This comparative analysis is actionable and could guide future method development.

Software contribution: The PyTorch Concepts library provides a concrete implementation path, though its impact depends on adoption and the maturity of the codebase.

Pedagogical value: The claim that SIM provides pedagogical grounding for interpretability curricula is plausible — the progression from premises to symmetries to constraints to architectures is didactically clean.

4. Timeliness & Relevance

The paper addresses a genuine need. Interpretability research is indeed fragmented, with methods often motivated by intuition rather than derived from first principles. The timing is appropriate given the scaling of AI systems and regulatory pressure (EU AI Act). The analysis of large-scale concept-based models (Steerling) and VLMs addresses current practical concerns.

5. Strengths & Limitations

Key Strengths:

Provides a principled derivation pipeline from premises to implementations, filling a real methodological gap

The distinction between architectural compilation and optimization-based enforcement is clarified with both theoretical and empirical support

Symmetry I's formalization of concept semantics as preorder preservation is a genuine conceptual contribution

The framework's modularity (changing premises yields different theories) is well-designed

Table 6's systematic comparison exposes concrete limitations in existing methods

Notable Weaknesses:

The Lagrangian mechanics framing, while intellectually appealing, may be more metaphorical than substantive — the physics analogy doesn't yield results that couldn't be obtained through standard constrained optimization

The three premises chosen for the instantiation, while reasonable, are not clearly justified as exhaustive or minimal; the paper acknowledges this as an open challenge but doesn't address it

Empirical validation remains thin relative to the theoretical ambition — the controlled experiments are on toy problems, and large-scale experiments are limited in scope

The framework's practical utility for designing *new* methods (beyond categorizing existing ones) is not convincingly demonstrated

Constraint II requires Jacobian computation, which is expensive for large models; scalability concerns are not adequately addressed

The connection between Symmetry III and Lie symmetry analysis of PDEs, while mathematically correct, may be inaccessible to the target audience and overly general for practical use

Missing elements: No user studies validate that models satisfying these symmetries are actually more interpretable to humans. The paper's epistemic falsifiability criterion acknowledges this but defers it entirely. The relationship to causal interpretability frameworks is underdeveloped.

Overall Assessment

This is an ambitious unifying paper that provides genuine organizational value to the interpretability field. Its strongest contribution is the systematic derivation pipeline and the comparative analysis it enables. However, the Lagrangian mechanics framing adds more notational complexity than genuine insight, and the empirical validation does not match the theoretical ambition. The framework is more useful as a taxonomic and analytical tool than as a practical method for building new interpretable systems.

Rating:6.5/ 10

Significance 7Rigor 6Novelty 7Clarity 6.5

Generated Jun 11, 2026

Comparison History (22)

Wonvs. Scale Buys Interpolation, Structure Buys a Horizon: Certified Predictability for Equivariant World Models

Paper 1 introduces a unifying theoretical framework (SIM) for interpretable ML grounded in Lagrangian mechanics, addressing a fundamental gap in a critically important and rapidly growing field. Its breadth of impact is enormous—it spans traditional, concept-based, and mechanistic interpretability, offers pedagogical value, and provides a deductive methodology applicable across the entire interpretability discipline. Paper 2, while technically rigorous with novel certified horizon guarantees for equivariant world models, addresses a narrower problem (multi-step prediction trust in equivariant models). Paper 1's potential to restructure and unify a fragmented field gives it broader and deeper long-term scientific impact.

claude-opus-4-6·Jun 12, 2026

Wonvs. Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Paper 2 proposes a general unifying theory (SIM) for interpretable machine learning grounded in Lagrangian mechanics, which addresses a fundamental gap across the entire interpretability field. Its breadth of impact spans traditional, concept-based, and mechanistic interpretability, offering both theoretical foundations and practical design principles. While Paper 1 makes solid contributions to latent reasoning with RL and mechanistic analysis, it addresses a narrower problem. Paper 2's potential to unify a fragmented discipline, inform curricula, and reshape how interpretability methods are designed gives it broader and longer-lasting scientific impact.

claude-opus-4-6·Jun 12, 2026

Lostvs. Select and Improve: Understanding the Mechanics of Post-Training for Reasoning

Paper 2 is likely to have higher impact due to its timeliness and direct applicability to a central, fast-moving problem: how RL post-training improves reasoning in LLMs. It provides concrete, empirically grounded mechanisms (strategy selection/improvement) and actionable training interventions, making it readily usable by both academia and industry. Paper 1 is ambitious and potentially unifying, but its Lagrangian-based general theory may face higher barriers to validation, adoption, and standardization, making near-term impact less certain despite high conceptual novelty.

gpt-5.2·Jun 12, 2026

Wonvs. Extracting Governing Equations from Latent Dynamics via Multi-View Contrastive Learning

Paper 2 proposes a unifying general theory for interpretable ML grounded in Lagrangian mechanics, addressing a fundamental gap in a rapidly growing field. Its breadth of impact is exceptional—spanning traditional ML, concept-based, and mechanistic interpretability—while offering both theoretical foundations and practical design principles. It has potential to reshape how the community approaches interpretability research and education. Paper 1, while technically strong with theoretical guarantees for latent dynamics recovery, addresses a more specialized problem. Paper 2's unifying framework across a fragmented discipline gives it broader and potentially more transformative impact.

claude-opus-4-6·Jun 12, 2026

Wonvs. Implicit Neural Representations of Individual Behavior

Paper 1 introduces a general theoretical framework (SIM) for interpretable ML grounded in Lagrangian mechanics, addressing a fundamental gap between theory and practice in AI interpretability. Its breadth of impact is substantial—it unifies fragmented subfields (concept-based, mechanistic, traditional interpretability), provides deductive design principles, and offers pedagogical foundations. The timeliness is high given growing regulatory and scientific demand for interpretable AI. Paper 2 presents a solid but more incremental contribution applying INRs to policy representation learning—a narrower problem with fewer cross-cutting implications for the broader ML community.

claude-opus-4-6·Jun 11, 2026

Wonvs. Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

Paper 2 proposes a unifying mathematical framework for AI interpretability, a critical and rapidly growing field. By grounding interpretability in Lagrangian mechanics, it offers a fundamental paradigm shift with broad theoretical and practical implications across all of machine learning. Paper 1, while practically valuable, presents a more specialized architectural improvement for point cloud-based robotic manipulation, giving it a narrower scope of impact.

gemini-3.1-pro-preview·Jun 11, 2026

Lostvs. MemNovo: Look Back at the Spectrum for Balanced De Novo Peptide Sequencing from Mass Spectrometry

Paper 2 likely has higher near-term scientific impact: it identifies a concrete, broadly relevant inference pathology in Transformer decoders (over-reliance on priors vs. input evidence) and proposes a training-free, plug-and-play fix with strong empirical gains on standard proteomics benchmarks and minimal overhead—high application value and methodological rigor. Paper 1 is ambitious and potentially transformative, but its impact depends on community adoption and validation of a broad theoretical framework; such general theories often face slower uptake and harder empirical falsification.

gpt-5.2·Jun 11, 2026

Wonvs. What Uncertainties Do We Need for Dynamical Systems?

Paper 1 introduces a comprehensive general theory (SIM) for interpretable ML grounded in Lagrangian mechanics, addressing a fundamental gap in a critical field. It provides a unifying framework that connects fragmented approaches (traditional, concept-based, mechanistic interpretability), offers deductive design principles, and demonstrates empirical utility. Its breadth of impact spans AI safety, education, and method design. Paper 2 is a useful discussion/position paper on uncertainty in dynamical systems but is more incremental—clarifying existing concepts rather than introducing a novel theoretical framework with broad applicability.

claude-opus-4-6·Jun 11, 2026

Wonvs. RePAIR: Predictive Self-Supervised Representation Learning in Chess

Paper 2 demonstrates higher potential scientific impact due to its breadth and timeliness. While Paper 1 introduces an innovative architecture for sequential data using chess, Paper 2 proposes a unifying, general theory for interpretable ML, a highly critical and fragmented area in AI. By grounding interpretability in Lagrangian mechanics, Paper 2 provides a foundational framework applicable to traditional, concept-based, and mechanistic interpretability. This theoretical contribution has far-reaching implications for model safety, debugging, and cross-domain AI research, offering significantly broader impact than the domain-specific representation learning results of Paper 1.

gemini-3.1-pro-preview·Jun 11, 2026

Wonvs. A Riemannian Approach to Low-Rank Optimal Transport

Paper 2 proposes a foundational, unifying theory for machine learning interpretability using Lagrangian mechanics. By addressing a critical bottleneck in AI adoption and offering a deductive framework that spans multiple interpretability subfields, it has a vastly broader potential impact than Paper 1, which, while methodologically rigorous, focuses on improving computational efficiency for a specific mathematical problem (Optimal Transport).

gemini-3.1-pro-preview·Jun 11, 2026

#493of 5669·cs.LG

#493 of 5669 · cs.LG

Tournament Score

1505±43

10501750

77%

Win Rate

Wins

Losses

Matches

Rating

6.5/ 10

Significance7

Rigor6

Novelty7

Clarity6.5