Missingness-MDPs: Bridging the Theory of Missing Data and POMDPs

Joshua Wendland, Markel Zubia, Roman Andriushchenko, Maris F. L. Galesloot, Milan Ceska, Henrik von Kleist, Thiago D. Simao, Maximilian Weininger

May 12, 2026

arXiv:2605.12262v1 PDF

cs.AI(primary)cs.LG

#145of 2292·Artificial Intelligence

#145 of 2292 · Artificial Intelligence

Tournament Score

1532±46

10501800

92%

Win Rate

Wins

Losses

Matches

Rating

6.8/ 10

Significance7

Rigor7.5

Novelty7.5

Clarity7.5

Tournament Score

1532±46

10501800

92%

Win Rate

Wins

Losses

Matches

Rating

6.8/ 10

Significance

Rigor

Novelty

Clarity

Abstract

We introduce missingness-MDPs (miss-MDPs), a novel subclass of partially observable Markov decision processes (POMDPs) that incorporates the theory of missing data. A miss-MDP is a POMDP whose observation function is a missingness function, specifying the probability that individual state features are missing (i.e., unobserved) at a time step. The literature distinguishes three canonical missingness types: missing (1) completely at random (MCAR), (2) at random (MAR), and (3) not at random (MNAR). Our planning problem is to compute near-optimal policies for a miss-MDP with an unknown missingness function, given a dataset of action-observation trajectories. Achieving such optimality guarantees for policies requires learning the missingness function from data, which is infeasible for general POMDPs. To overcome this challenge, we exploit the structural properties of different missingness types to derive probably approximately correct (PAC) algorithms for learning the missingness function. These algorithms yield an approximate but fully specified miss-MDP that we solve using off-the-shelf planning methods. We prove that, with high probability, the resulting policies are epsilon-optimal in the true miss-MDP. Empirical results confirm the theory and demonstrate superior performance of our approach over two model-free POMDP methods.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: Missingness-MDPs: Bridging the Theory of Missing Data and POMDPs

1. Core Contribution

This paper introduces missingness-MDPs (miss-MDPs), a formally defined subclass of POMDPs where the observation function is specifically a "missingness function" — it either passes through each state feature perfectly or replaces it with a missing symbol ⊥. The key insight is to import the classical taxonomy from missing data theory (MCAR, MAR, MNAR) into the POMDP framework, and exploit the structural constraints each missingness type imposes to make the otherwise intractable problem of learning the observation function feasible.

The paper addresses a genuine gap: while POMDPs naturally model partial observability, learning an arbitrary observation function from data is statistically and computationally intractable. By restricting to the miss-MDP subclass and leveraging missingness type assumptions, the authors derive PAC algorithms that approximate the missingness function from action-observation trajectories, then solve the resulting approximate POMDP using off-the-shelf solvers (SARSOP). The end-to-end guarantee — that the resulting policy is ε-optimal in the true miss-MDP with high probability — is the paper's central theoretical result.

2. Methodological Rigor

The theoretical framework is carefully constructed. The definitions cleanly formalize MCAR, MAR (including a useful "simple MAR" variant), and MNAR within the POMDP setting. The PAC guarantees (Theorems 1-3) are formally proven, building on Okamoto's inequality for Bernoulli processes and the continuity of value functions under perturbations of transition probabilities (leveraging results from robust MDP literature).

The proofs in the appendix are thorough, including the important Lemma 2 that bounds the value difference between the true and approximate miss-MDPs. The argument chains through uncountable belief MDPs, finite truncations, and robust MDP perturbation bounds. One technical limitation acknowledged by the authors is that the sample complexity bounds are quite large — the theoretical guarantees are asymptotic, and practical datasets are insufficient to formally invoke them. The smoothing parameter κ used to handle zero counts is a practical necessity that breaks the formal guarantees for finite data, though this is a common pragmatic compromise.

The experimental evaluation on two benchmarks (ICU and Predator) with multiple missingness types is systematic and well-structured. The use of 20 independent runs with interquartile ranges, and the inclusion of both total variation metrics for the learned missingness function and policy value metrics, provides a comprehensive picture. However, the benchmarks are relatively small (800-1765 reachable states), and scalability to larger state spaces is not explored.

3. Potential Impact

Bridging two communities. The paper's most significant conceptual contribution is connecting missing data theory — a mature statistical framework — with POMDP planning. This bridge could stimulate cross-fertilization: POMDP researchers gain structured assumptions that make learning tractable, while missing data researchers gain a sequential decision-making framework for their concepts.

Practical applications. The clinical decision-making scenario (ICU) is well-motivated: medical data is notoriously incomplete, and the missingness mechanism matters for treatment decisions. Robotics with sensor failures is another natural application domain. However, the assumption that the underlying MDP (transition function) is known is quite restrictive for real-world applications. The authors acknowledge this and suggest miss-POMDPs as future work.

Theoretical contributions. The ignorability result (Remark 1) — that MAR missingness cancels out in belief updates — is an elegant theoretical insight that parallels classical ignorability in statistics. This could inform practical implementations even beyond the paper's specific algorithms.

4. Timeliness & Relevance

The paper addresses a timely need. With growing interest in deploying RL/planning agents in healthcare, autonomous systems, and other domains with inherently incomplete observations, principled handling of missing data is increasingly important. Most existing approaches either ignore the missingness mechanism or treat it heuristically. The paper fills a clear theoretical gap.

The comparison against model-free methods (PPO, POMCP) demonstrates that these general-purpose approaches struggle with the specific structure of miss-MDPs, validating the need for specialized methods. However, this comparison is somewhat unfair since the model-based approach assumes knowledge of the transition function while model-free methods do not.

5. Strengths & Limitations

Key Strengths:

Clean, well-motivated formalization that creates a new problem class with clear theoretical properties

Complete theoretical pipeline from learning the missingness function to computing ε-optimal policies with PAC guarantees

The ignorability result for MAR missingness is an elegant insight

Systematic experimental evaluation across multiple missingness types, including non-identifiable MNAR

Honest treatment of when assumptions break (MNAR unidentifiable cases)

Notable Limitations:

The known transition function assumption is restrictive; real-world applications often have uncertain dynamics

Scalability is untested — the factored state space could grow exponentially, and the counting-based approximation algorithms scale poorly with state space size

The sample complexity bounds (while proven to exist) are acknowledged to be impractically large

The benchmarks, while illustrative, are small and somewhat artificial

The comparison with model-free baselines is asymmetric in the information provided

The paper focuses on identifiable missingness types; the extension to non-identifiable MNAR (which is arguably more realistic) remains open

The missingness function is assumed to be action-independent and time-homogeneous, which may not hold in practice

Missing discussions: The paper could benefit from discussing connections to active information gathering (where the agent can choose actions that reduce missingness) and to imputation-based approaches more thoroughly. The relationship to robust POMDPs, where uncertainty in the observation function is modeled explicitly, deserves discussion.

Overall Assessment

This is a solid theoretical contribution that opens a new research direction at the intersection of missing data theory and POMDPs. The formalization is clean, the theoretical results are sound, and the experimental evaluation supports the theory. The practical impact is currently limited by strong assumptions (known transition function, small state spaces), but the conceptual bridge between these two fields is valuable and could inspire significant follow-up work. The paper represents a meaningful advance in understanding structured partial observability.

Rating:6.8/ 10

Significance 7Rigor 7.5Novelty 7.5Clarity 7.5

Generated May 13, 2026

Comparison History (25)

vs. Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight

claude-opus-4.65/16/2026

Paper 1 introduces a novel theoretical framework (miss-MDPs) that rigorously bridges two well-established fields—missing data theory and POMDPs—with formal PAC guarantees. This creates a new subclass with exploitable structural properties, advancing fundamental understanding in sequential decision-making under partial observability. Paper 2 addresses an important practical problem (LLM reasoning oversight) but is more incremental and engineering-focused, tied to current LLM paradigms that may shift rapidly. Paper 1's theoretical contributions have longer-lasting impact and broader applicability across multiple domains dealing with missing observations.

vs. Harnessing Agentic Evolution

gpt-5.25/16/2026

Paper 1 offers a clearer methodological contribution with theoretical guarantees: it defines a novel, well-scoped POMDP subclass linking missing-data mechanisms (MCAR/MAR/MNAR) to planning, and derives PAC learning results that are otherwise infeasible in general POMDPs. This rigor plus broad applicability (any sequential decision system with missing features: healthcare, robotics, recommender systems, operations) suggests durable cross-field impact. Paper 2 is timely and potentially useful for AI engineering, but appears more empirical/system-oriented with less formal grounding; its impact may depend on rapidly shifting benchmarks and agent frameworks.

vs. VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

gpt-5.25/16/2026

Paper 2 likely has higher scientific impact due to its broader, more general contribution: it unifies missing-data theory (MCAR/MAR/MNAR) with POMDPs, introduces a new problem class, and provides PAC guarantees plus empirical validation. This can influence reinforcement learning, causal inference, healthcare/robotics with sensor dropout, and any sequential decision-making under missing observations. Paper 1 is timely and practically valuable for LLM infrastructure, but its impact may be narrower (systems for serving) and more contingent on rapidly evolving tooling; its core idea (agentic system synthesis) is compelling but less foundational theoretically.

vs. Pathways to AGI

gemini-3.15/16/2026

Paper 2 introduces a mathematically rigorous framework bridging missing data theory and POMDPs, complete with PAC learning guarantees and empirical validation. Its concrete algorithmic contributions address a pervasive real-world problem in sequential decision-making, offering broad applicability across AI fields. In contrast, Paper 1 offers a qualitative, critical perspective on AGI development. While relevant, Paper 2's theoretical depth, methodological rigor, and immediate practical utility give it a much higher potential for substantial and measurable scientific impact.

vs. Causal Algorithmic Recourse: Foundations and Methods

claude-opus-4.65/16/2026

Paper 1 introduces a novel theoretical framework (miss-MDPs) that bridges two well-established fields—missing data theory and POMDPs—creating a new subclass with provable PAC guarantees. This connection is highly original and broadly applicable, as missing data is ubiquitous in sequential decision-making (healthcare, robotics, etc.). The rigorous theoretical contributions (PAC algorithms, ε-optimality proofs) combined with practical applicability give it strong foundational impact. Paper 2 makes valuable contributions to algorithmic recourse but addresses a narrower problem. Paper 1's bridging of two major fields has greater potential to spawn new research directions.

vs. Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation

gpt-5.25/16/2026

Paper 1 likely has higher scientific impact due to its more fundamental theoretical contribution: it introduces a new structured POMDP subclass linking missing-data theory (MCAR/MAR/MNAR) to sequential decision-making, and provides PAC guarantees for learning and planning under unknown missingness—addressing an important identifiability/learnability gap in general POMDPs. This rigor and formalization can influence RL, causal/missing-data modeling, and healthcare/operations settings with systematic missingness. Paper 2 is timely and practical, but LLM-in-the-loop symbolic regression may be viewed as less principled, harder to guarantee, and potentially less durable given rapid shifts in LLM tooling.

vs. TrafficClaw: Generalizable Urban Traffic Control via Unified Physical Environment Modeling

gpt-5.25/16/2026

Paper 2 likely has higher scientific impact: it introduces a principled new formalism (missingness-MDPs) that bridges two mature areas—missing-data theory and POMDPs—yielding PAC learning guarantees and epsilon-optimality results under MCAR/MAR/MNAR structure. This is methodologically rigorous, broadly applicable across domains with partial observability and sensor/data missingness (health, robotics, recommender systems, economics), and provides reusable theoretical tools. Paper 1 is timely and practically relevant, but appears more domain-specific (traffic) and relies on complex LLM-agent engineering where generality and rigor may be harder to establish.

vs. Revisiting Privacy Preservation in Brain-Computer Interfaces: Conceptual Boundaries, Risk Pathways, and a Protection-Strength Grading Framework

gpt-5.25/13/2026

Paper 2 presents a new formal model class (missingness-MDPs) that unifies missing-data theory (MCAR/MAR/MNAR) with POMDPs, and delivers PAC-learning algorithms with epsilon-optimality guarantees plus empirical validation. This combination of conceptual novelty, methodological rigor, and broad applicability (any sequential decision-making with missing features: healthcare, robotics, recommender systems, operations) suggests higher cross-field impact. Paper 1 is timely and important for BCI ethics/privacy, but is primarily a conceptual review/framework with less technical novelty and fewer testable methodological contributions.

vs. GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

gpt-5.25/13/2026

Paper 2 has higher potential scientific impact due to its clearer conceptual novelty (formalizing missing-data mechanisms within a POMDP subclass), strong methodological rigor (PAC learning guarantees and epsilon-optimality in the true environment), and broad cross-field relevance bridging RL/planning with missing-data statistics—applicable to healthcare, robotics, and recommender systems where sensor/feature missingness is endemic. Paper 1 is timely and useful for LLM post-training, but it appears more incremental within an already-crowded fine-tuning/RLHF space and offers narrower theoretical guarantees.

vs. TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

gpt-5.25/13/2026

Paper 2 introduces a new formal problem class (missingness-MDPs) that bridges missing-data theory (MCAR/MAR/MNAR) with POMDPs, yielding PAC algorithms and epsilon-optimality guarantees—strong methodological rigor and clear theoretical novelty. Its framing is broadly applicable to real-world sequential decision-making with sensor/dropout and data-quality issues across domains (healthcare, robotics, recommender systems), increasing cross-field impact and timeliness. Paper 1 is a solid algorithmic improvement to PPO for MTRL with strong empirical results, but is more incremental and narrower in scope than a new theory-grounded framework with guarantees.

vs. Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution

claude-opus-4.65/13/2026

Paper 1 introduces a novel formal framework (miss-MDPs) that bridges two well-established fields—missing data theory and POMDPs—with rigorous theoretical contributions (PAC algorithms, epsilon-optimality guarantees) and empirical validation. This creates actionable tools for real-world sequential decision-making under missing observations, a common practical problem. Paper 2, while thought-provoking, is a philosophical/position paper arguing for epistemological shifts in AI evaluation. Though timely, it lacks concrete methodological contributions or empirical results, limiting its direct scientific impact compared to Paper 1's novel algorithms and theoretical guarantees.

vs. Agentivism: a learning theory for the age of artificial intelligence

gemini-3.15/13/2026

Paper 1 offers high methodological rigor by formally bridging missing data theory with POMDPs, providing PAC learning bounds and empirical validation. While Paper 2 presents a timely conceptual framework for education, Paper 1 solves a fundamental, mathematically rigorous technical challenge in sequential decision-making. Its provable guarantees for handling unknown missingness in real-world AI applications (like healthcare or robotics) demonstrate a deeper, more immediate scientific advancement compared to the purely theoretical propositions of Paper 2.

vs. NARA: Anchor-Conditioned Relation-Aware Contextualization of Heterogeneous Geoentities

gpt-5.25/13/2026

Paper 2 introduces a new formalism (missingness-MDPs) that connects missing-data theory (MCAR/MAR/MNAR) with POMDP planning, and provides PAC-learning guarantees plus empirical validation. Theoretical guarantees and a general interface to many domains with partial observability and missing features (healthcare, robotics, recommender systems, operations) suggest broad, cross-field impact and strong methodological rigor. Paper 1 is timely and practically useful for vector geospatial foundation models, but its impact is more domain-specific and primarily empirical, with less fundamental theory.

vs. Agent-First Tool API: A Semantic Interface Paradigm for Enterprise AI Agent Systems

gpt-5.25/13/2026

Paper 1 has higher potential scientific impact due to a clearer theoretical novelty (formal miss-MDP class linking missing-data mechanisms to POMDP observation models), rigorous methodological contributions (PAC learning guarantees and epsilon-optimality proofs), and broader cross-field relevance (RL/POMDPs, statistical missing-data theory, healthcare/operations domains with systematic missingness). Paper 2 is timely and practically valuable for enterprise agent engineering, but appears more like an applied systems/interface paradigm with domain-specific validation and less generalizable theory, making its long-term scientific impact less certain.

vs. Automated Reformulation of Robust Optimization via Memory-Augmented Large Language Models

gemini-3.15/13/2026

Paper 2 establishes a foundational bridge between missing data theory and POMDPs, providing rigorous PAC learning guarantees and epsilon-optimal policies. This theoretical advancement has broad implications across reinforcement learning and decision-making under uncertainty. While Paper 1 offers a valuable practical tool for robust optimization using LLMs, Paper 2's methodological rigor, novel formalization of 'missingness-MDPs', and potential to influence core RL algorithms give it a higher potential for broad and lasting scientific impact.

vs. Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

claude-opus-4.65/13/2026

Paper 2 introduces a formally rigorous framework (miss-MDPs) that bridges two well-established fields—missing data theory and POMDPs—with PAC guarantees and empirical validation. Its methodological contributions (learnable observation models, provable optimality bounds) are concrete and immediately applicable to real-world sequential decision-making under incomplete observations. Paper 1 proposes an interesting conceptual framework for AI governance with preliminary experiments, but its formalization is less rigorous, the empirical evidence is limited (single experiment, n=1 style), and its contributions are more speculative. Paper 2's theoretical depth and broader applicability give it higher impact potential.

vs. Saliency-Aware Regularized Quantization Calibration for Large Language Models

gemini-3.15/13/2026

Paper 1 offers a foundational theoretical bridge between the theory of missing data and POMDPs, introducing a novel subclass with rigorous PAC learning algorithms and epsilon-optimality guarantees. This conceptual innovation addresses a fundamental challenge in sequential decision-making with broad applicability across disciplines like healthcare and robotics. In contrast, Paper 2 presents an incremental, though highly practical, regularization technique for LLM quantization. The theoretical rigor, structural novelty, and lasting foundational impact of Paper 1 give it a higher potential scientific impact.

vs. Artificial Intelligence and the Structure of Mathematics

gemini-3.15/13/2026

Paper 2 offers concrete, rigorous methodological advancements by bridging missing data theory with POMDPs. It provides PAC learning algorithms, theoretical optimality bounds, and empirical validation. This makes it immediately applicable to real-world reinforcement learning problems involving imperfect data (e.g., healthcare, robotics). In contrast, Paper 1 is an essay presenting speculative, philosophical perspectives on AI and mathematics. While conceptually intriguing, Paper 2's rigorous mathematical framework, actionable algorithms, and direct applicability to pervasive data issues give it a significantly higher potential for concrete scientific and practical impact.

vs. CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and Aggregation

gpt-5.25/13/2026

Paper 1 introduces a new formal problem class (missingness-MDPs) that bridges missing-data theory (MCAR/MAR/MNAR) with POMDP planning, and provides PAC learning guarantees leading to ε-optimal policies—strong methodological rigor and clear conceptual novelty. Its impact could span RL, planning, causal/missing-data statistics, and healthcare/recommender domains where missingness is endemic, making it timely and broadly relevant. Paper 2 is practically valuable for scalable multimodal GNNs, but is a more incremental architectural refinement with primarily empirical validation and narrower theoretical contribution.

vs. PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement

gpt-5.25/13/2026

Paper 2 likely has higher scientific impact due to stronger novelty and breadth: it formally unifies missing-data theory (MCAR/MAR/MNAR) with POMDPs via a new problem class and provides PAC learning + epsilon-optimal policy guarantees—methodological rigor that can influence RL, causal inference, statistics, healthcare, and any domain with sensor/data missingness. Its contributions are general and theoretical, enabling principled planning under missing observations. Paper 1 is timely and practically useful for LLM agents, but is more incremental within an active empirical refinement/feedback paradigm and is benchmark-dependent with fewer formal guarantees.