Learning with Simulators: No Regret in a Computationally Bounded World

Sasha Voitovych, Abhishek Shetty, Noah Golowich, Alexander Rakhlin

Jun 11, 2026arXiv:2606.13576v1

cs.LGcs.CCcs.DSstat.ML

#144of 5669·cs.LG

#144 of 5669 · cs.LG

Tournament Score

1544±49

10501750

89%

Win Rate

Wins

Losses

Matches

Rating

8.5/ 10

Significance9

Rigor9

Novelty8.5

Clarity8.5

Abstract

Understanding the minimal assumptions necessary for generalization is the fundamental question in learning theory. Unfortunately, most results rely heavily on independence (or some proxy thereof) of the data-generating process, while results for strongly dependent data are far more limited. Towards addressing this gap, we introduce the framework of simulatable processes, where the learner has access to a simulator that approximates the distribution generating the data (which may be an arbitrarily complex and dependent process). Surprisingly, given access to such a simulator, we show that we can recover the same learning guarantees as in the classical setting with independent data, namely, error bounds that depend on the VC dimension. Further, we use this framework to study the power of conditional sampling and show strict statistical and computational advantages in this setting. As a highlight of our framework, we exhibit a single algorithm that simultaneously learns any given VC class under all processes samplable in bounded polynomial time, with regret controlled by the time-bounded Kolmogorov complexity of the process. This provides a significant conceptual broadening of the classical PAC model.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: "Learning with Simulators: No Regret in a Computationally Bounded World"

1. Core Contribution

This paper introduces the framework of simulatable processes for online learning, where the learner has access to a simulator (either unconditional or conditional) that approximates the data-generating distribution. The central insight is that access to such a simulator—even an approximate one—allows recovery of the same learnability characterization (VC dimension) as in the classical i.i.d. PAC setting, despite the data having arbitrary temporal dependencies.

The paper makes three main contributions:

Unconditional simulator results (Theorem 3.1): VC classes are learnable with O(d log T) regret given exact simulators, matching i.i.d. rates.

Conditional simulator results (Theorem 4.1): Oracle-efficient algorithms achieving O(√(dT(1+KL))) agnostic regret, matching PAC-style complexity both statistically and computationally.

Universal algorithm (Theorem 5.4): A single algorithm that simultaneously learns any VC class against all polynomial-time samplable processes, with regret controlled by time-bounded Kolmogorov complexity—a learning-theoretic analog of Levin search.

2. Methodological Rigor

The technical execution is strong and multi-layered:

Upper bounds: The multi-scale cover algorithm (Algorithm 1) elegantly handles the change-of-measure challenge between P* and P. The key innovation is the multi-scale structure that allows regret to depend on the *realized* density ratio rather than its worst-case tail—a clever technique that converts a logarithmic decay in N into the desired fast rate. The log-MGF relaxation (Section 4) extends the Rakhlin-Sridharan relaxation framework to approximate simulators, with the supermartingale argument (Lemma 4.2) providing clean regret control via Donsker-Varadhan. The variance analysis for the plug-in estimator (Lemmas 4.5-4.6) shows that despite exponentially large quantities, the *ratio* structure enables polynomial-sample estimation.

Lower bounds: The paper provides matching lower bounds across multiple dimensions: the sample-approximation tradeoff (Theorem 3.3), computational hardness with unconditional samples (Theorem 3.4, extending Hazan-Koren), and the impossibility of agnostic learning without conditional access (Theorem 3.5). These collectively demonstrate that the upper bounds are essentially tight.

One subtle point: The realizability assumption (Assumption 1) is carefully discussed, and the paper honestly addresses its necessity (Appendix E), showing that arbitrary label dependence on covariates recovers classical Littlestone hardness.

3. Potential Impact

Theoretical impact: This paper provides a significant conceptual broadening of the PAC model. The result that bounded computational complexity of the data-generating process suffices for learning (Theorem 5.4) formalizes a long-standing intuition connecting the extended Church-Turing thesis to learnability. This could reshape how the learning theory community thinks about the independence assumption.

Bridging communities: The framework connects learning theory to information theory (redundancy/universal compression), Kolmogorov complexity, and computational complexity in novel ways. The connection between simulator quality measured by KL divergence and the redundancy of source coding (Section 5, Corollary 5.1) is particularly elegant.

Practical relevance: The conditional simulator setting directly models autoregressive generative models (LLMs, diffusion models) that are increasingly available in practice. The applications to LDS (Theorem 5.5) and Glauber dynamics (Theorem 5.6) demonstrate that the framework yields concrete, non-trivial guarantees for physically motivated processes—with no mixing time dependence, which is a qualitative improvement over prior work.

Sim-to-real transfer: The framework provides a principled theoretical foundation for simulation-based learning, quantifying the "reality gap" via KL divergence.

4. Timeliness & Relevance

The paper is exceptionally well-timed. The explosion of generative models (LLMs, diffusion models) means that approximate simulators for complex processes are increasingly available. Providing a theoretical framework that leverages these simulators for downstream learning tasks addresses a genuine emerging need. The paper explicitly acknowledges this connection, grounding abstract theory in practical motivation.

The paper also addresses the long-standing theoretical gap between i.i.d. learning theory and online learning with dependent data—a problem that has seen renewed interest but limited progress for strongly dependent processes.

5. Strengths & Limitations

Key Strengths:

*Clean separation of unconditional vs. conditional access:* Table 1 provides a complete picture of achievable rates, with matching upper and lower bounds in most regimes.

*The universal algorithm (Theorem 5.4) is a conceptual highlight:* A single algorithm that works for all polytime-samplable processes, with regret depending on Kolmogorov complexity, is a beautiful result.

*The multi-scale cover technique* is a genuinely novel algorithmic idea that could find applications elsewhere.

*Applications to LDS and Glauber dynamics* are non-trivial and demonstrate the framework's utility beyond abstract generality.

Limitations:

The universal algorithm requires exponential dependence on VC dimension (T^O(d)), which limits practical applicability to low-dimensional classes.

Conditional sampling from the universal distribution is computationally hard under cryptographic assumptions, creating a gap between the most general result and efficient algorithms.

The realizability requirement for unconditional simulators is restrictive; while the paper proves this is necessary, it limits the framework's scope.

The KL divergence between P* and P may be hard to estimate in practice, making it difficult to assess the quality of guarantees a priori.

The applications to specific process families (LDS, Glauber) require domain-specific simulator construction, somewhat undermining the "no assumptions" narrative.

6. Additional Observations

The paper is clearly written with excellent exposition, despite the technical density. The comparison table (Table 1) is particularly effective. The honest treatment of limitations—especially the computational barriers and the discussion of Assumption 1—strengthens rather than weakens the paper.

Rating:8.5/ 10

Significance 9Rigor 9Novelty 8.5Clarity 8.5

Generated Jun 12, 2026

Comparison History (18)

Wonvs. Extracting Governing Equations from Latent Dynamics via Multi-View Contrastive Learning

Paper 2 has higher potential impact due to a broader, more foundational contribution: it extends learning theory beyond (approximate) independence via simulatable processes, recovers VC-style guarantees, and links regret to time-bounded Kolmogorov complexity—ideas likely to influence multiple areas (PAC learning, online learning, conditional sampling, computational learning theory). Its applicability spans any domain with dependent/complex data where simulators exist, making it timely and broadly relevant. Paper 1 is strong and useful for scientific discovery in dynamical systems, but is more specialized and likely narrower in cross-field impact.

gpt-5.2·Jun 12, 2026

Wonvs. Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Paper 1 introduces a fundamentally new theoretical framework (simulatable processes) that broadens the foundational PAC learning model, addressing a core open question about learning under dependent data. It provides novel connections between VC dimension, Kolmogorov complexity, and computational complexity, with broad implications across learning theory. Paper 2 presents a solid engineering contribution to latent reasoning in LLMs, but is more incremental—improving on existing hidden-state recurrence methods with boundary tokens. Paper 1's conceptual breadth and theoretical depth give it higher long-term scientific impact across multiple fields.

claude-opus-4-6·Jun 12, 2026

Wonvs. Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

Paper 2 is more likely to have higher scientific impact: it introduces a broadly applicable theoretical framework (simulatable processes) that relaxes independence assumptions and recovers VC-dimension-style guarantees for strongly dependent data, with connections to conditional sampling and time-bounded Kolmogorov complexity. This is a conceptual generalization of PAC learning with potential to influence learning theory, complexity, and practical learning with simulators. Paper 1 offers insightful empirical geometry/sparsity analysis of on-policy distillation with useful training heuristics, but its scope is narrower and more tied to current LLM post-training practice.

gpt-5.2·Jun 12, 2026

Wonvs. Accelerating Speculative Diffusions via Block Verification

Paper 2 addresses a fundamental limitation in classical learning theory by providing generalization guarantees for dependent data via simulatable processes, significantly broadening the PAC model. In contrast, Paper 1 offers a highly specialized, incremental engineering improvement (6.3% speedup) for speculative diffusion models. The theoretical foundations and broader conceptual impact of Paper 2 give it a significantly higher potential for long-term scientific influence.

gemini-3.1-pro-preview·Jun 12, 2026

Wonvs. Uncertainty Estimation for Molecular Diffusion Models

Paper 2 introduces a broadly applicable theoretical framework (simulatable processes) that relaxes independence assumptions and recovers VC-style guarantees for dependent, computationally bounded data sources. This is conceptually novel, timely for modern ML settings involving simulators and complex dependencies, and potentially impacts learning theory, online learning/regret analysis, conditional sampling, and computational complexity. Its claims suggest wide cross-field influence and foundational relevance. Paper 1 is useful and practical for molecular diffusion model reliability, but is more domain-specific and post-hoc, likely yielding narrower impact than a generalization of the PAC model.

gpt-5.2·Jun 12, 2026

Lostvs. Data-driven discovery of governing differential equations across physical systems

Paper 2 addresses the rapidly growing field of AI for Science, offering a unifying framework for discovering differential equations from data. While Paper 1 provides a rigorous theoretical foundation for machine learning, Paper 2 boasts exceptional breadth of impact across physics, biology, and engineering. Its highly timely focus on data-driven scientific discovery and high potential for real-world applications give it a higher estimated scientific impact, as framework papers in this cross-disciplinary area typically shape diverse future research and garner extensive citations.

gemini-3.1-pro-preview·Jun 12, 2026

Wonvs. Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

Paper 1 offers a broad conceptual advance in learning theory: extending PAC/VC-style generalization guarantees to arbitrarily dependent data via a simulator-access model, plus links to conditional sampling and time-bounded Kolmogorov complexity. This is highly novel, methodologically rigorous, and potentially impacts multiple areas (learning theory, complexity, online learning, dependence modeling). Paper 2 is timely and useful for mechanistic interpretability, with solid theory+empirics, but its impact is more domain-specific (SAEs/LLM interpretability). Overall breadth and foundational significance favor Paper 1.

gpt-5.2·Jun 12, 2026

Wonvs. Once-for-All: Scalable Simultaneous Forecasting via Equilibrium State Estimation

Paper 2 addresses a fundamental question in learning theory—generalizing beyond independence assumptions—by introducing the novel framework of simulatable processes. It provides deep theoretical contributions (recovering VC-dimension-based guarantees under dependent data, proving computational/statistical separations, and connecting to Kolmogorov complexity), which broadly impact learning theory, computational complexity, and the foundations of machine learning. Paper 1 offers a useful engineering contribution for multi-system forecasting with practical speedups, but its conceptual novelty and breadth of theoretical impact are more limited compared to Paper 2's foundational advances.

claude-opus-4-6·Jun 12, 2026

Wonvs. Select and Improve: Understanding the Mechanics of Post-Training for Reasoning

Paper 2 likely has higher impact: it proposes a new learning-theoretic framework (simulatable processes) that relaxes independence assumptions while recovering VC-type guarantees, and connects to conditional sampling, computational constraints, and time-bounded Kolmogorov complexity. This is broadly applicable across learning theory, online learning, and complexity, and offers conceptual generalization of PAC learning. Paper 1 provides valuable mechanistic insights for RL post-training of LLM reasoning, but is more domain-specific, empirically scoped to a small model, and less likely to reshape foundational theory across fields.

gpt-5.2·Jun 12, 2026

Wonvs. Loss-Shift Transfer via Bayes Quotients

Paper 1 is more novel and broadly impactful: it introduces a new learning-theoretic framework (simulatable processes) that extends PAC-style VC-dimension guarantees to arbitrarily dependent, computationally bounded data-generating processes, and connects regret to time-bounded Kolmogorov complexity. This is a significant conceptual generalization with potential cross-field influence (learning theory, online learning, complexity, causal/simulation-based modeling). Paper 2 offers a clean and timely reframing of transfer under loss shift with useful identities for log loss, but its scope is narrower and more tied to representation/transfer phenomena than foundational guarantees.

gpt-5.2·Jun 12, 2026

#144of 5669·cs.LG

#144 of 5669 · cs.LG

Tournament Score

1544±49

10501750

89%

Win Rate

Wins

Losses

Matches

Rating

8.5/ 10

Significance9

Rigor9

Novelty8.5

Clarity8.5