Abhijoy Sarkar, Aarchi Singh Thakur
Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computational models on the corresponding longitudinal patient trajectories. We introduce OncoTraj, a public benchmark of 813 EGFR-mutant NSCLC patients receiving first-line osimertinib, harmonized from three real-world clinical-genomic sources: MSK-CHORD (672 patients), AACR Project GENIE BPC NSCLC (34 patients), and the FLAURA molecular-resistance supplement (107 patients). OncoTraj defines three locked tasks: (A) binary classification of progression by a fixed 12-month landmark, (B) regression of time-to-first-progression in days, and (C) six-class classification of the dominant resistance mechanism. We release the harmonized dataset, patient-level train/validation/test splits with an audited no-leakage guarantee, an open-source evaluation harness, and six reference baselines spanning a majority-class predictor, logistic regression, random forest, XGBoost, an LSTM, and a multi-task transformer. With v1's single-timepoint snapshot features, no task clears chance on clean within-source evaluation: the uniformity of this ceiling across every model class localizes the limit to the input modality (single-snapshot tissue NGS rather than serial ctDNA), not the algorithm. The benchmark does recover a reproducible literature-consistent association: TP53 co-mutation raises the 12-month progression rate from 29% to 59% cohort-wide. OncoTraj establishes a reproducible, leakage-audited baseline and converts the modality limit into concrete design requirements for a serial-ctDNA-enriched v2.
OncoTraj introduces a public benchmark of 813 EGFR-mutant NSCLC patients receiving first-line osimertinib, harmonized from three clinical-genomic sources (MSK-CHORD, GENIE BPC, FLAURA). It defines three locked prediction tasks—12-month landmark progression (binary), time-to-progression (regression), and resistance mechanism classification (6-class)—alongside patient-level splits, an evaluation harness, and six reference baselines. The paper's central thesis is not that these tasks are solved, but rather that they are *unsolvable* with the current input modality (single-snapshot tissue NGS), thereby converting a negative result into a concrete specification for future data collection (serial ctDNA).
This is a benchmark-infrastructure paper, not a methods paper. Its contribution lies in standardizing the problem formulation, providing leakage-audited splits, and documenting exactly where and why current data fails—rather than in algorithmic novelty.
The paper demonstrates unusual honesty and rigor in several respects:
The paper addresses a real need: resistance prediction on osimertinib is clinically important, and the lack of standardized evaluation frameworks hampers computational oncology research. The timing is appropriate—serial ctDNA monitoring is becoming routine, and computational methods are being developed that will need benchmarks. However, v1 arrives too early to be useful for actual method development, making it more of a position statement than a functional benchmark.
OncoTraj is a well-intentioned and transparently documented benchmark that currently serves more as a detailed negative result and specification document than as a functional evaluation platform. Its primary value is in formalizing the problem, documenting exactly why single-snapshot tissue NGS is insufficient for resistance prediction, and establishing infrastructure for a future v2 with serial ctDNA. The exceptional honesty about limitations is laudable but also reveals that the benchmark is premature as a practical tool. The impact will depend entirely on whether v2 materializes with adequate serial molecular data.
Generated Jun 10, 2026
Paper 1 provides foundational theoretical guarantees for distributed machine learning, addressing a critical bottleneck (stragglers in ASGD) with broad applicability across large-scale AI. Its high-probability convergence proofs under heavy-tailed noise represent a significant methodological advance. While Paper 2 offers a valuable clinical benchmark, its narrow focus on a specific cancer subtype and its baseline negative results limit its cross-disciplinary reach compared to the universal utility of robust distributed optimization algorithms.
Paper 1 introduces a broadly applicable, principled mathematical framework for controlling diffusion models, a highly active and widely impactful area of AI research. Its ability to improve sample quality and enforce fairness constraints gives it immense cross-disciplinary potential. While Paper 2 provides a highly valuable medical benchmark, its impact is constrained to a specific subfield of oncology, whereas Paper 1's methodological innovation will likely influence a wider array of domains and generate broader scientific interest.
OncoTraj addresses a critical unmet need in precision oncology by creating the first public benchmark for longitudinal resistance prediction in EGFR-mutant NSCLC. It provides a harmonized dataset, standardized tasks, evaluation infrastructure, and baselines that can catalyze an entire research community. Its identification of specific data modality limitations (single-snapshot vs. serial ctDNA) provides actionable design requirements. Paper 1 presents an interesting adversarial robustness finding, but RL-based gradient disruption is incremental within an already crowded adversarial ML field, and the practical utility remains uncertain given potential adaptive attacks. OncoTraj's benchmark infrastructure has broader, more durable impact across ML and oncology.
Paper 2 likely has higher scientific impact due to releasing a large, harmonized, leakage-audited public clinical-genomic benchmark with locked tasks and an evaluation harness. Such resources can catalyze broad, long-term work across ML, oncology, bioinformatics, and regulatory/clinical translation, and its explicit identification of a modality ceiling informs future data collection (serial ctDNA) and study design. Paper 1 is a solid, timely algorithmic contribution for diffusion-policy fine-tuning in robotics, but its impact is narrower and more incremental relative to fast-moving RL/model-based control literature compared with a new public benchmark in precision oncology.
OncoTraj addresses a critical unmet need in precision oncology by providing the first public benchmark for longitudinal resistance prediction in EGFR-mutant NSCLC. It creates reusable infrastructure (harmonized dataset, evaluation harness, leakage-audited splits) that can catalyze an entire research community around a clinically important problem. Its honest reporting of negative results (no model beats chance with current features) provides actionable insight directing future data collection (serial ctDNA). While Paper 1 offers incremental improvements in prompt learning for LLM agents, Paper 2 has broader cross-disciplinary impact spanning oncology, genomics, and ML, with direct translational potential.
Paper 1 addresses a major bottleneck in LLM distillation with a novel hidden-state approach, offering immediate, broad impact through significant gains in reasoning performance, training speed, and memory efficiency. Paper 2 introduces a valuable oncology benchmark, but its immediate impact is constrained by the negative results of its initial single-snapshot data modality.
OncoTraj addresses a critical gap in computational oncology by providing the first public benchmark for longitudinal resistance prediction in EGFR-mutant NSCLC, enabling reproducible model development for a clinically important problem. It has strong real-world medical applications, promotes open science with released data/code, and identifies concrete modality limitations guiding future work. Paper 2 makes a narrower theoretical contribution analyzing a target update variant in linear Q-learning—a well-studied area with incremental novelty. OncoTraj's potential to catalyze precision oncology research gives it broader and more timely impact across clinical and ML communities.
Paper 2 likely has higher scientific impact: it creates a large, public, leakage-audited clinical-genomic benchmark with locked tasks, splits, and an evaluation harness—an enabling resource that can standardize and accelerate work across oncology, ML, and biomarker development. It also clearly identifies a modality ceiling and sets concrete requirements for improved data collection (serial ctDNA), which can steer the field. Paper 1 is technically novel and timely for distributed LLM training, but impacts may be narrower to systems/ML training infrastructure and may compete with fast-moving proprietary implementations.
Paper 2 presents a general methodological advancement in machine learning (conditional tabular diffusion) with broad applicability across numerous domains relying on tabular data. In contrast, Paper 1 introduces a highly specialized medical benchmark where current data modalities fail to exceed chance performance. The fundamental algorithmic innovation in Paper 2 offers significantly broader impact and immediate practical utility across diverse fields compared to the domain-specific stepping-stone dataset in Paper 1.
While Paper 2 offers a valuable methodological advancement in topological data analysis, Paper 1 is likely to have higher scientific impact due to its profound clinical relevance. By introducing a harmonized, leakage-audited public benchmark for longitudinal cancer resistance prediction, Paper 1 addresses a critical bottleneck in computational oncology. Although it highlights the limitations of current single-snapshot data, establishing this standardized evaluation framework will catalyze future algorithmic development and guide next-generation data collection efforts, directly paving the way for predictive models that could improve patient outcomes in cancer treatment.