Andrew Kang, Priya Narasimhan
We recast pass evaluation in football (soccer) as a Monte Carlo Tree Search (MCTS)-like evaluation problem whose components mostly exist in the literature under different names: a value model (possession value), a world model (multi-agent trajectories with ball interactions), and a policy over counterfactual actions (sampling pass variants with noise). Building on the first public high-fidelity tracking dataset with 3D ball trajectories from the Bundesliga, we introduce Monte Carlo Pass Search (MCPS), which infers kick parameters for each observed pass, samples execution variants and option variants, rolls each candidate forward with a ball-conditioned world model until the next ball interaction, and scores outcomes with a learned value model to obtain a distribution over gained value. This distribution enables distribution-aware attribution with two complementary execution-surplus scores used for analysis and ranking: mean-based and percentile-based scores. To make the world model sample-efficient under limited public data, we adapt a discrete-token, autoregressive trajectory generator from autonomous driving (SMART) and show it yields strong best-of-20 forecasting accuracy compared to baselines, while supporting fully hypothetical rollouts for downstream evaluation. We have released model checkpoints and code.
The paper introduces Monte Carlo Pass Search (MCPS), a framework that recasts pass evaluation in football as a Monte Carlo Tree Search-like problem. The key idea is to evaluate a pass not as a point estimate of value, but as a *distribution* over counterfactual outcomes. The framework has four main components: (1) inference of physical kick parameters from observed passes, (2) sampling of local (execution noise) and global (alternative option) counterfactual pass variants, (3) a learned world model (adapted from autonomous driving) that rolls out multi-agent trajectories conditioned on sampled ball flights, and (4) a possession value model that scores terminal states. The dual local/global search produces distribution-aware metrics that disentangle execution quality from decision quality.
The conceptual contribution is well-articulated: the authors explicitly frame pass evaluation as a planning problem with policy, world model, and value model components, connecting football analytics to the reinforcement learning and world-model literature. This framing, while not technically novel in isolation (the authors acknowledge each component exists separately), is synthesized in a coherent and operationally useful way.
The methodology is reasonable but has significant limitations that the authors partially acknowledge:
Practical applications: The framework could be valuable for coaching, scouting, and recruitment in professional football by offering richer pass evaluation than existing point-estimate metrics. The visualization tools (opportunity/sensitivity views) are directly applicable to coaching workflows.
Broader methodological impact: The connection between MCTS-style evaluation and sports analytics is potentially influential — it provides a template for applying world-model-based reasoning to other sports decisions (shots, dribbles, set pieces). The adaptation of autonomous driving trajectory models to sports is a useful cross-domain transfer that others may follow.
Reproducibility: The release of code and model checkpoints is commendable and addresses a major pain point in football analytics, where proprietary data and models dominate. This alone could catalyze follow-up work.
However, the impact is constrained by the small data regime and the lack of convincing downstream validation. Without evidence that MCPS rankings correlate with expert judgment or predictive outcomes, the framework remains a conceptual demonstration rather than a validated tool.
The paper is well-timed, leveraging the first public high-fidelity tracking dataset with 3D ball trajectories (Bassek et al., 2025). It addresses a genuine bottleneck: the proprietary nature of football analytics has limited reproducible research. The connection to generative AI and world models is topical. The cross-pollination from autonomous driving (SMART) to sports is timely given the maturity of AV trajectory prediction methods.
MCPS presents an intellectually appealing framework that connects football analytics to world-model-based planning in a principled way. The conceptual contribution is solid, and the open-source release addresses a real community need. However, the paper is fundamentally limited by data scarcity, resulting in under-performing sub-components (especially the value model) and a lack of convincing end-to-end validation. The work reads more as a proof-of-concept or framework proposal than a validated methodology. Its impact will depend heavily on whether the community adopts and scales the approach with larger datasets.
Generated Jun 10, 2026
Paper 2 demonstrates higher scientific impact potential due to its cross-disciplinary innovation (combining MCTS with sports analytics and autonomous driving models), broader methodological contributions (adapting SMART from autonomous driving to football trajectory prediction, novel distribution-aware attribution), and wider applicability beyond its specific domain. It leverages a unique 3D tracking dataset, introduces a reusable framework for counterfactual evaluation, and bridges multiple active research communities. Paper 1, while practically useful, addresses a narrow engineering application with a relatively straightforward multi-agent LLM orchestration approach, and its main finding about model scale is already well-documented in the LLM literature.
Paper 2 introduces a novel framework (MCPS) that creatively adapts techniques across domains (MCTS, autonomous driving trajectory models) for sports analytics, with broader methodological contributions including distribution-aware attribution and cross-domain transfer of trajectory generation models. It releases code and checkpoints, enabling reproducibility. Paper 1 is a competition solution report with incremental engineering contributions combining existing LLM/VLM techniques. Paper 2 has greater novelty, broader cross-field impact (sports analytics, multi-agent modeling, counterfactual reasoning), and stronger methodological rigor.
Paper 2 introduces a novel framework (MCPS) combining MCTS-style evaluation with 3D ball tracking data for football pass evaluation, bridging sports analytics, multi-agent trajectory prediction, and counterfactual reasoning. It adapts methods from autonomous driving (SMART) to a new domain, releases code/checkpoints, and uses a novel public dataset. Paper 1 addresses a narrower problem (occlusion handling in language-agent memory palaces) with results that the authors themselves acknowledge as 'near-tautological,' and the confirmatory studies remain future work. Paper 2 has broader cross-domain impact, stronger methodological novelty, and more immediate practical applications.
Paper 1 addresses a fundamental and highly relevant challenge in modern AI (memory limits in long-horizon language agents), offering broad applicability across numerous domains relying on LLMs. Paper 2, while methodologically innovative in adapting autonomous driving techniques to sports analytics, has a much narrower scope of impact primarily restricted to football data science.
Paper 1 offers a highly practical and scalable solution to a major bottleneck in architecture and real estate design. By combining a novel dataset, a domain-specific language, and vision-language models for procedural reasoning, it presents a comprehensive neuro-symbolic framework. While Paper 2 introduces an innovative multi-agent world model approach for sports analytics, Paper 1 has broader immediate real-world applications and commercial potential across multiple large-scale industries.
Paper 1 presents a concrete, novel methodological integration (MCTS-style counterfactual search + learned world/value models) enabled by rare 3D tracking data, with measurable evaluation, model adaptations, and released code/checkpoints—supporting reproducibility and near-term uptake in sports analytics and trajectory-modeling research. Its approach is timely (world models, counterfactual evaluation), has clear real-world applications (player/team decision analysis), and can generalize to other multi-agent domains. Paper 2 is largely conceptual/theoretical with unclear formalism, validation, or implementable methodology, making near-term scientific impact less likely.
Paper 2 addresses a fundamental and highly timely challenge in artificial intelligence: improving LLM agents' performance in long-horizon tasks by mitigating long-context interference. Its proposed methodology has broad applicability across numerous domains where autonomous agents are deployed. In contrast, Paper 1 focuses on a niche application (sports analytics for football). While methodologically rigorous and innovative in its specific domain, Paper 1 lacks the cross-disciplinary breadth and widespread technological relevance of Paper 2.
Paper 1 introduces a novel framework (Monte Carlo Pass Search) combining multiple ML techniques for counterfactual pass evaluation in football, with released code/checkpoints and a public 3D tracking dataset. It has clear real-world applications in sports analytics, methodological novelty in adapting autonomous driving trajectory models to sports, and broad appeal across ML and sports science communities. Paper 2, while methodologically sound, reports a scoped negative result on cross-model activation transfer in a narrow setting with small models, limiting its broader impact and applicability despite contributing useful knowledge about mechanistic interpretability limitations.
Paper 2 addresses a fundamental challenge in LLM agents—long-term persistent memory—which is highly relevant given the explosive growth of LLM agent research. The topic-structured document approach with iterative retrieval is broadly applicable across many agent applications. Paper 1, while technically sophisticated and well-executed, targets a narrow domain (football/soccer pass evaluation) with limited cross-disciplinary impact. The LLM memory problem affects a much larger research community and has wider real-world applications, giving Paper 2 greater potential scientific impact despite Paper 1's strong methodological contribution within sports analytics.
Paper 1 is more novel and methodologically substantial: it integrates an MCTS-like counterfactual evaluation framework with a learned multi-agent world model using rare public 3D ball-tracking data, enabling distributional pass value attribution and releasing code/checkpoints. It has clear real-world application in sports analytics and broader relevance to counterfactual reasoning, model-based RL, and generative trajectory modeling. Paper 2 is mainly a replication/benchmarking study of an existing LLM planner, with limited innovation and narrower impact, despite being timely and useful for validation.