Tabular Foundation Models for Clinical Survival Analysis via Survival-Aware Adaptation

Minh-Khoi Pham, Luca Cotugno, Alina Sirbu, Tai Tan Mai, Martin Crane, Marija Bezbradica

Jun 10, 2026arXiv:2606.12006v1

cs.LGcs.AI

#3914of 5669·cs.LG

#3914 of 5669 · cs.LG

Tournament Score

1352±43

10501750

30%

Win Rate

Wins

Losses

Matches

Rating

5.5/ 10

Significance5

Rigor6

Novelty4.5

Clarity7.5

Abstract

Predicting time-to-event outcomes such as mortality is a fundamental task in clinical decision-making, commonly addressed through survival analysis. While classical statistical and deep learning approaches have been widely studied, they typically require task-specific training and sufficient labeled data. Recent advances in tabular foundation models offer a new paradigm by learning general-purpose representations for structured data. However, their applicability to censored time-to-event prediction in clinical settings remains underexplored, as typical applications are restricted to discrete classification rather than survival analysis tasks. In this work, we propose a lightweight adaptation approach for applying tabular foundation models to clinical survival analysis by directly training a survival-aware head on top of the pretrained representations. We study representative architectures, including TabPFN, TabDPT, and TabICL, and adapt them using a multi-task logistic regression (MTLR) head to model right-censored time-to-event outcomes. We evaluate this approach on a diverse set of public survival benchmarks and two large-scale ICU cohorts, MIMIC-IV and eICU. Our results show that this transfer learning approach achieves competitive or superior performance compared to strong baselines. On MIMIC-IV, TabDPT-FT-MTLR reaches a C-index of 0.856, corresponding to a relative improvement of +1.4% over the best non-FM baseline (DeepSurv, 0.844) and +6.7% over the best zero-shot model (0.802). On eICU, TabICL-FT-MTLR achieves 0.797, yielding gains of +1.7% (DeepSurv, 0.784) and +6.4% (0.749), respectively. These findings highlight the importance of combining pretrained tabular representations with survival-aware objectives and suggest that tabular foundation models provide a practical and effective alternative for clinical survival prediction.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

Core Contribution

This paper proposes a lightweight adaptation strategy for applying tabular foundation models (TabPFN, TabDPT, TabICL) to clinical survival analysis by attaching a multi-task logistic regression (MTLR) head on top of frozen pretrained representations. The key insight is that pretrained tabular representations, originally designed for classification/regression, can be effectively repurposed for censored time-to-event prediction without modifying backbone weights. The paper contrasts this survival-aware adaptation approach against (a) zero-shot reformulation where survival is treated as a sequence of binary classification tasks, and (b) traditional survival baselines trained from scratch.

The contribution is primarily one of integration rather than fundamental novelty—combining two existing components (tabular FMs and MTLR heads) in a sensible way. However, the systematic evaluation and the demonstration that this simple combination works well across diverse clinical settings provides practical value.

Methodological Rigor

Strengths in experimental design:

The evaluation spans 9 datasets of varying scale, dimensionality, and clinical domains, providing reasonable evidence of generalizability.

5-fold cross-validation with stratification by event indicator and discretized time quantiles is appropriate.

Consistent preprocessing pipeline across all models reduces confounding from data handling differences.

Statistical significance testing (Wilcoxon signed-rank) is included, though its application is somewhat selective.

Risk stratification analysis with Kaplan-Meier curves adds clinical interpretability.

Weaknesses:

The paper only reports C-index as the primary metric. For survival analysis, calibration metrics (e.g., Brier score, integrated Brier score) and time-dependent AUC would strengthen claims. C-index alone captures discrimination but not calibration, which is clinically important.

The improvements, while consistent, are modest in absolute terms. On MIMIC-IV, the gain over DeepSurv is 0.012 in C-index (0.856 vs 0.844). On several smaller datasets, the differences fall within standard deviation ranges, raising questions about practical significance.

The statistical significance markers (asterisks) in Table 2 appear somewhat inconsistently applied and seem to indicate significance relative to the best non-FM baseline, but the interpretation is not entirely clear—some baseline methods also receive asterisks.

Hyperparameter tuning details for baselines are sparse. Whether baselines were given equivalent tuning effort is unclear, which could bias comparisons.

The adaptive binning strategy is pragmatic but introduces a dataset-dependent hyperparameter (number of bins) that could influence results.

Potential Impact

The paper addresses a genuine practical need: simplifying the deployment of survival models in clinical settings where labeled data may be limited and modeling expertise scarce. The "freeze backbone, train lightweight head" paradigm is appealing for clinical deployment due to:

1. Reduced computational overhead compared to end-to-end deep survival models

2. Simplified hyperparameter tuning since only the head requires optimization

3. Potential for rapid adaptation to new clinical cohorts

However, the impact is somewhat bounded by several factors. The improvements over well-tuned DeepSurv are modest (1-2% relative), and the approach still requires some labeled survival data for head training, limiting its advantage over standard transfer learning approaches. The zero-shot setting, which would be most impactful for truly data-scarce scenarios, performs noticeably worse than the adapted version.

The clinical risk stratification analysis (Figure 1) is a strength that demonstrates practical utility beyond aggregate metrics, showing clearer separation of risk groups with survival-aware adaptation.

Timeliness & Relevance

The paper is timely in two respects: (1) tabular foundation models are rapidly gaining traction (TabPFN, TabDPT, TabICL are all recent), and (2) there is growing interest in applying foundation model paradigms to clinical prediction tasks. The intersection of these trends—adapting tabular FMs specifically for survival analysis—is underexplored, making this a relevant contribution.

The concurrent work by Kim et al. (2026) on reformulation-based approaches and Seletkov et al. (2026) on Survival In-Context suggests this is an active research front. This paper's positioning as a simpler alternative to specialized pretraining (SIC) or temporal expansion (Kim et al.) is reasonable, though the inability to compare against SIC due to lack of public implementation is a limitation.

Strengths & Limitations

Key Strengths:

Clean experimental design with comprehensive baselines spanning classical, ML, and deep learning methods

Practical approach that is easy to implement and deploy

Evaluation on both small benchmarks and large-scale EHR datasets (MIMIC-IV, eICU) demonstrates scalability

Risk stratification analysis provides clinically grounded evaluation

Code is publicly available, supporting reproducibility

Notable Limitations:

Limited novelty: attaching an MTLR head to frozen representations is a straightforward application of transfer learning principles

Only static features from a 24-hour window are considered; no temporal/longitudinal modeling

Single-event survival only; no competing risks analysis

No calibration assessment or time-dependent discrimination metrics beyond C-index

No ablation studies on head architecture, number of bins sensitivity, or representation dimensionality

Missing interpretability analysis (acknowledged by authors)

The paper does not explore fine-tuning the backbone, which could potentially yield larger gains

Modest improvements that may not be clinically meaningful in practice (difference of 0.01-0.02 in C-index)

No analysis of computational costs or inference time comparisons, which would strengthen the practical deployment argument

Additional Observations

The paper's framing around "foundation models" should be interpreted carefully. The tabular FMs used here (especially TabPFN) are pretrained on synthetic data, not on clinical data. The transferability of synthetic-data representations to real clinical tasks is interesting but the mechanisms remain unexplained. The observation that "much of the difficulty in clinical survival analysis lies in representation learning rather than survival-specific loss design" is intriguing but not rigorously substantiated.

The venue (AIiH 2026, a workshop/conference paper) is appropriate for the contribution level. This work serves as a useful empirical study establishing that tabular FMs can work for survival analysis, laying groundwork for more sophisticated approaches.

Rating:5.5/ 10

Significance 5Rigor 6Novelty 4.5Clarity 7.5

Generated Jun 11, 2026

Comparison History (23)

Lostvs. Uncertainty Estimation for Molecular Diffusion Models

Paper 2 addresses a critical methodological gap in generative models for 3D molecular generation by introducing a principled uncertainty estimation method. This has profound implications for AI-driven drug discovery, allowing for better quality control and test-time scaling. While Paper 1 presents a valuable clinical application, Paper 2 offers higher methodological innovation and broader potential impact across the rapidly growing intersection of generative AI and computational chemistry.

gemini-3.1-pro-preview·Jun 12, 2026

Lostvs. From Uncertain Judgments to Calibrated Rankings: Conformal Elo Estimation for LLM Evaluation

Paper 2 is likely higher-impact: it tackles a timely, broadly relevant bottleneck (scalable LLM evaluation) with a principled uncertainty-aware ranking method combining probabilistic Bradley–Terry/Elo and conformal prediction with distribution-free coverage guarantees. Its applicability extends across model benchmarking, alignment, and product evaluation, and it directly addresses known judge biases/miscalibration. Paper 1 is solid and practical for clinical survival prediction, but is a narrower domain adaptation of existing tabular foundation models with incremental performance gains, thus likely more limited in cross-field impact.

gpt-5.2·Jun 12, 2026

Lostvs. Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

Paper 2 has higher likely scientific impact due to broader relevance and conceptual novelty: it provides mechanistic insights into how on-policy distillation changes parameters (sparsity patterns, optimizer interactions, and geometric structure) across multiple LM/VLM settings, with actionable implications (subnetwork training, optimizer choice) for widely used post-training pipelines. This general analysis can influence practice across many domains using foundation models. Paper 1 is timely and useful for clinical survival prediction, but its contribution is a comparatively straightforward head adaptation with incremental performance gains and narrower domain scope.

gpt-5.2·Jun 12, 2026

Lostvs. The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics

Paper 1 offers a fundamentally novel geometric framework for understanding phase transitions in diffusion/flow-matching models, connecting caustic theory to generative AI dynamics. This theoretical contribution has broad implications across generative modeling, providing both conceptual understanding and practical tools (CBD). Paper 2, while methodologically sound and clinically relevant, represents an incremental adaptation—applying existing tabular foundation models to survival analysis with a known MTLR head. The novelty is limited to the combination rather than new theory. Paper 1's theoretical depth and breadth of impact across the rapidly growing generative AI field give it higher potential impact.

claude-opus-4-6·Jun 12, 2026

Wonvs. Distributional Loss for Robust Classification

Paper 1 provides a highly concrete, timely adaptation of tabular foundation models for clinical survival analysis, a critical healthcare domain. It demonstrates strong methodological rigor with specific, quantitative improvements on major datasets (MIMIC-IV, eICU). Paper 2 proposes a general loss function with broad potential but lacks quantitative evidence in the abstract, making its actual impact more speculative.

gemini-3.1-pro-preview·Jun 12, 2026

Wonvs. Understanding helpfulness and harmless tension in reward models

Paper 2 likely has higher scientific impact due to stronger real-world applicability (clinical survival prediction), clear methodological contribution (survival-aware adaptation of tabular foundation models with MTLR for censoring), and broader immediate utility across healthcare and tabular ML. It is timely given growing interest in foundation models beyond text, and it validates on large, widely used ICU cohorts (MIMIC-IV, eICU) plus public benchmarks, suggesting robustness. Paper 1 is novel and relevant for AI alignment interpretability, but its impact may be narrower and more exploratory, with less direct deployment pathway.

gpt-5.2·Jun 12, 2026

Lostvs. Harness In-Context Operator Learning with Chain of Operators

Paper 2 introduces a more novel conceptual framework (Chain of Operators) that draws an innovative analogy between prompt engineering in LLMs and operator learning, enabling OOD generalization without retraining. This cross-pollination of ideas between foundation model prompting strategies and scientific computing/PDEs is highly innovative and has broader potential impact across computational science. Paper 1, while rigorous and practically useful, represents a relatively incremental adaptation (adding a survival head to existing tabular foundation models), combining known components rather than introducing a fundamentally new paradigm.

claude-opus-4-6·Jun 11, 2026

Lostvs. Categorical Prior Lock-in: Why In-Context Learning Fails for Structured Data

Paper 2 identifies and characterizes a general failure mode of in-context learning for structured data (“categorical prior lock-in”), with implications for any LLM-based conditional generation under distribution shift. This is novel, timely, and broadly impactful across ML, data synthesis, evaluation, privacy, and deployment, and it frames an important trade-off between adaptability and memorization risk. Paper 1 is practically useful for clinical survival prediction but is more incremental (adapting existing tabular foundation models with a known survival head) and its impact is narrower to survival/tabular transfer learning.

gpt-5.2·Jun 11, 2026

Lostvs. Attention by Synchronization in Coupled Oscillator Networks

Paper 1 introduces a highly novel, interdisciplinary paradigm connecting physical oscillator dynamics to transformer attention, enabling low-power neuromorphic hardware implementations. Its theoretical depth and potential to shift paradigms in AI hardware give it a broader, more profound scientific impact compared to Paper 2's incremental, domain-specific application of tabular models to clinical survival analysis.

gemini-3.1-pro-preview·Jun 11, 2026

Lostvs. Learning Explicit Behavioral Models with Adaptive Questions and World-Model Probes

Paper 2 has higher potential impact due to a more novel, general framework for mechanistic/explicit behavioral modeling that integrates adaptive questioning and world-model probes directly into training. If validated, this could influence multiple areas (RL, interpretability, agent evaluation, world models, debugging and adaptation) with broad applicability beyond a single domain. Paper 1 is timely and practically useful for clinical survival analysis, but it is a comparatively incremental adaptation (pretrained tabular encoders + survival head) with narrower cross-field impact and limited methodological novelty relative to existing transfer-learning paradigms.

gpt-5.2·Jun 11, 2026

#3914of 5669·cs.LG

#3914 of 5669 · cs.LG

Tournament Score

1352±43

10501750

30%

Win Rate

Wins

Losses

Matches

Rating

5.5/ 10

Significance5

Rigor6

Novelty4.5

Clarity7.5