Transferable Human Mobility Network Reconstruction with neuroGravity

Jinming Yang, Shaoyu Huang, Zongyuan Huang, Yaohui Jin, Xiaokang Yang, Marta C. Gonzalez, Yanyan Xu

Apr 26, 2026

arXiv:2604.23678v1 PDF

cs.AI(primary)

#86of 2292·Artificial Intelligence

#86 of 2292 · Artificial Intelligence

Tournament Score

1548±34

10501800

68%

Win Rate

Wins

Losses

Matches

Rating

7.8/ 10

Significance8

Rigor7.5

Novelty7.5

Clarity8

Tournament Score

1548±34

10501800

68%

Win Rate

Wins

Losses

Matches

Rating

7.8/ 10

Significance

Rigor

Novelty

Clarity

Abstract

Accurate modeling of human mobility is critical for tackling urban planning and public health challenges. In undeveloped regions, the absence of comprehensive travel surveys necessitates reconstructing mobility networks from publicly available data. Here we develop neuroGravity, a physics-informed deep learning model that reliably reconstructs mobility flows from limited observations and transfers to unobserved cities. Using only urban facility and population distributions, we find that neuroGravity's regional representations strongly correlate with socioeconomic and livability status, offering scalable proxies for costly surveys. Furthermore, we uncover that spatial income segregation plays a key role in model transferability: mobility networks are most reliably reconstructed when target cities share similar segregation levels with the source. We design an index to quantify this segregation and accurately predict transferability. Finally, we generate mobility flow proxies for over 1,200 cities worldwide, highlighting neuroGravity's potential to mitigate critical data shortages in resource-limited, underdeveloped areas.

AI Impact Assessments

(3 models)

Scientific Impact Assessment: Transferable Human Mobility Network Reconstruction with neuroGravity

1. Core Contribution

The paper introduces neuroGravity, a physics-informed deep learning model that reconstructs human mobility networks (origin-destination flow matrices) from limited observations and publicly available data (OpenStreetMap features, population distributions). The core novelty lies in the hybrid integration of the classical gravity model with graph neural networks through three specific innovations: (1) a "meta-Gravity" component that deep-parameterizes the gravity law by learning context-dependent gravitational constants and distance-decay exponents via MLPs; (2) an edge-enhanced graph transformer that refines these physically-grounded base estimates; and (3) operation in logarithmic space, which naturally preserves the multiplicative structure of gravity laws while normalizing the heavy-tailed flow distribution for neural network optimization.

The paper addresses three distinct problems: few-shot mobility reconstruction (from ~1% of observed flows), zero-shot cross-city transfer, and socioeconomic proxy generation. It also introduces a spatial income segregation index (SI) based on Bregman Information decomposition that quantitatively predicts model transferability between city pairs.

2. Methodological Rigor

The experimental design is thorough across multiple dimensions. The evaluation spans six cities across different continents (Boston, LA, SF Bay, Porto, Bogotá, Riyadh), with mobility data from diverse sources (CDR, LBS, census commuting). The few-shot reconstruction experiments use 30 independent runs with randomly sampled observations, providing statistical robustness. The paper systematically compares against relevant baselines including the classical gravity model, vanilla GNNs, and an improved Deep Gravity variant (DG++).

The three observation scenarios are clearly delineated, with focus on the most challenging "internal observations" case. The 38% R² improvement over the best baseline under 10% observation is substantial. The cross-city transfer experiments showing near-doubling (+99%) of R² over baselines on CDR data tests are particularly compelling.

However, several methodological concerns merit attention. The use of TimeGeo for scaling trajectories to population-level flows introduces an intermediate modeling step whose errors could propagate. The connection predictor (LightGBM) is treated somewhat as a black box, and errors in topology prediction would cascade into flow estimation. The paper acknowledges OSM data quality issues but the 30% missing ratio threshold may be insufficient for the Global South targets the model aims to serve.

3. Potential Impact

The practical implications are substantial. Generating mobility flow estimates for 1,200+ cities worldwide addresses a genuine data gap, particularly for the Global South where travel surveys are prohibitively expensive. The demonstration that neuroGravity flows can drive SEIR epidemic simulations with results comparable to ground-truth flows has direct public health relevance.

The discovery that regional embeddings correlate with socioeconomic indicators (income R²=0.42, carbon footprint R²=0.73, NO₂ R²=0.78 in Boston) without explicit training on these variables is scientifically interesting. This provides cost-effective proxies for environmental and socioeconomic monitoring.

The spatial income segregation index and its linear relationship with transferability (fitting R²=0.97) is perhaps the most intellectually novel contribution. This provides a principled, predictive framework for understanding when and why cross-city transfer succeeds or fails—moving beyond empirical observation to a quantifiable mechanism. This insight connects urban science, segregation studies, and transfer learning in a meaningful way.

4. Timeliness & Relevance

The work addresses a current bottleneck at the intersection of urban computing, public health preparedness, and global equity. Post-COVID, the importance of mobility-informed epidemic modeling is well-established, yet mobility data remains scarce precisely where it is most needed. The reliance on publicly available data (OSM, WorldPop) makes this practically deployable. The timing aligns with growing interest in physics-informed machine learning and foundation models for geospatial applications.

5. Strengths & Limitations

Key Strengths:

The hybrid architecture elegantly balances interpretability and accuracy, with the gravity law providing structural priors that prevent overfitting in data-scarce regimes

The SI-based transferability prediction is a genuinely novel theoretical contribution that provides actionable guidance for practitioners

Comprehensive evaluation across diverse cities, data sources, and observation scenarios

The dataset contribution (1,200+ city mobility networks) has standalone value for the community

Code and data availability enhance reproducibility

Notable Limitations:

The socioeconomic embedding analysis shows diminishing returns in polycentric cities (LA, SF Bay), with Rg prediction failing entirely (R²=-0.07 in LA). This suggests the approach's utility may be geographically biased toward monocentric urban forms

The SI index requires income data for computation, creating a circular dependency: you need income data to predict transferability to cities that likely lack such data

Validation in sub-Saharan Africa is mentioned but relegated to supplementary materials—this is arguably the most important validation for the paper's stated goal of serving underdeveloped regions

The paper does not deeply analyze failure modes or provide confidence intervals on the generated 1,200-city dataset

The connection predictor's errors and their downstream propagation are not systematically quantified

The comparison with recent gravity-learning approaches (e.g., Cabanas-Tirapu et al., 2025, ref [23]) appears limited

Additional Observations

The logarithmic space formulation, while presented as a design choice, is actually a significant architectural insight—it transforms the gravity law's multiplicative structure into an additive one compatible with standard neural network operations. The weighted training loss (Eq. 8) for few-shot scenarios is a pragmatic but important detail that prioritizes reliable high-volume flows over noisy low-volume observations.

The paper would benefit from uncertainty quantification on generated flows, particularly for the 1,200-city global application where ground truth is unavailable. The model-assembling technique for cross-city transfer is described but its contribution relative to single-model transfer is not clearly ablated.

Rating:7.8/ 10

Significance 8Rigor 7.5Novelty 7.5Clarity 8

Generated Apr 28, 2026

Comparison History (41)

vs. MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation

gpt-5.25/5/2026

Paper 2 likely has higher impact due to strong real-world applicability (urban planning, public health) and broad cross-field relevance (network science, socioeconomic analysis, policy, epidemiology). Its transferable, data-sparse reconstruction addresses a pressing global data gap and scales to 1,200+ cities, suggesting immediate downstream utility. The inclusion of physics-informed modeling, interpretable correlates with socioeconomic metrics, and a predictive transferability index indicates methodological rigor and actionable insights. Paper 1 is timely in AI/code generation but is more field-specific and its impact depends on adoption within rapidly shifting LLM tooling.

vs. The World Leaks the Future: Harness Evolution for Future Prediction Agents

gemini-35/5/2026

Paper 2 offers substantial cross-disciplinary impact by addressing critical data shortages in urban planning and public health, especially for developing regions. Its application of physics-informed deep learning to generate human mobility networks for over 1,200 cities demonstrates high real-world utility and methodological scalability. While Paper 1 presents an innovative LLM agent framework for future prediction, Paper 2's immediate societal applications, insights into socioeconomic factors like income segregation, and broad relevance across multiple scientific fields give it a higher potential for overarching scientific impact.

vs. Machine individuality: Separating genuine idiosyncrasy from response bias in large language models

claude-opus-4.65/5/2026

Paper 1 addresses a critical real-world problem (human mobility modeling in data-scarce regions) with broad practical applications in urban planning, public health, and development policy. It combines physics-informed deep learning with novel insights about spatial income segregation and transferability, generates actionable data for 1,200+ cities worldwide, and directly benefits underdeveloped regions. Paper 2 introduces an interesting concept of 'machine individuality' in LLMs using rigorous psychometric methods, but its practical implications are less immediate and its scope is narrower. Paper 1's cross-disciplinary impact (computational social science, urban studies, public health, AI) and scalable real-world utility give it higher potential impact.

vs. Towards Platonic Representation for Table Reasoning: A Foundation for Permutation-Invariant Retrieval

claude-opus-4.65/5/2026

Paper 2 (neuroGravity) has higher potential impact due to its broader real-world applications in urban planning and public health across 1,200+ cities worldwide, particularly in underdeveloped regions with data scarcity. It combines physics-informed deep learning with novel insights on spatial income segregation and transferability, offering immediate practical utility. While Paper 1 makes a solid theoretical contribution to table representation learning with formal metrics, its scope is narrower (table retrieval in RAG systems) and more incremental. Paper 2's interdisciplinary reach across computational social science, urban studies, and public health gives it wider potential influence.

vs. Targeted Exploration via Unified Entropy Control for Reinforcement Learning

claude-opus-4.65/5/2026

Paper 1 presents a novel physics-informed deep learning framework for human mobility reconstruction that addresses critical data gaps in underdeveloped regions, generates proxies for 1,200+ cities worldwide, and uncovers fundamental insights about spatial income segregation's role in model transferability. It combines methodological innovation with broad real-world impact across urban planning, public health, and socioeconomic analysis. Paper 2 offers a useful but more incremental improvement to RL training stability for LLMs/VLMs, addressing entropy collapse in GRPO—a narrower technical contribution within a rapidly evolving field where methods are quickly superseded.

vs. PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

gemini-34/28/2026

Paper 2 has broader interdisciplinary impact, directly addressing critical global challenges in urban planning, public health, and socio-economic equity. By successfully scaling mobility proxies to over 1,200 cities and uncovering the role of income segregation in model transferability, it offers a highly practical and scalable solution for resource-limited regions, giving it greater real-world and scientific impact than the AI benchmark improvements in Paper 1.

vs. PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

gemini-34/28/2026

Paper 2 addresses a critical global challenge (data shortages in undeveloped regions for urban planning and public health) with a highly scalable, physics-informed model. Its ability to generate mobility proxies for over 1,200 cities worldwide and uncover the role of spatial income segregation demonstrates significant real-world applicability and broad interdisciplinary impact across urban studies, sociology, and epidemiology. While Paper 1 offers a valuable methodological advancement in VLM reasoning, Paper 2's tangible societal benefits, global scale, and cross-disciplinary relevance give it higher potential scientific and real-world impact.

vs. A systematic evaluation of vision-language models for observational astronomical reasoning tasks

gemini-34/28/2026

While Paper 1 provides a valuable benchmark for AI in astronomy, Paper 2 demonstrates broader societal and cross-disciplinary impact. By reconstructing human mobility networks for data-scarce regions globally, Paper 2 directly addresses critical challenges in urban planning, public health, and socioeconomics. Its physics-informed approach and insights into spatial income segregation offer scalable, real-world solutions that extend well beyond a single scientific domain.

vs. A systematic evaluation of vision-language models for observational astronomical reasoning tasks

gemini-34/28/2026

Paper 2 offers broader real-world applications and higher immediate societal impact by addressing critical data shortages in urban planning and public health for underdeveloped regions. Its physics-informed approach, insights into spatial income segregation, and generation of mobility proxies for over 1,200 cities worldwide demonstrate a highly scalable and interdisciplinary contribution compared to Paper 1's domain-specific VLM benchmarking.

vs. Emotion Concepts and their Function in a Large Language Model

gemini-34/28/2026

Paper 1 addresses the highly critical and timely field of AI safety and alignment by providing mechanistic insights into how 'functional emotions' causally drive misaligned LLM behaviors like reward hacking. This offers profound implications for controlling and interpreting foundation models. While Paper 2 presents valuable applications for urban planning, the fundamental insights into LLM internal representations in Paper 1 are likely to have a broader and more immediate impact across the rapidly advancing field of artificial intelligence.

vs. Hodoscope: Unsupervised Monitoring for AI Misbehaviors

claude-opus-4.64/28/2026

Paper 1 demonstrates broader scientific impact through its interdisciplinary contribution spanning computational social science, urban planning, and public health. It introduces a physics-informed deep learning model applicable to 1,200+ cities worldwide, addresses critical data gaps in underdeveloped regions, and uncovers fundamental insights about spatial income segregation's role in mobility transferability. Paper 2 introduces a useful tool for AI monitoring but addresses a narrower, more applied problem. While timely given AI safety concerns, Paper 1's global applicability, theoretical insights connecting socioeconomic factors to mobility, and potential to inform policy in resource-limited settings give it greater breadth and lasting scientific significance.

vs. Emotion Concepts and their Function in a Large Language Model

gemini-34/28/2026

Paper 1 addresses a critical and timely challenge in AI safety and mechanistic interpretability. By uncovering how internal emotion representations causally influence LLM behavior, including alignment failures like reward hacking, it provides fundamental insights with broad implications for deploying safe AI systems. While Paper 2 offers valuable real-world applications for urban planning, the foundational nature of Paper 1's findings in the rapidly accelerating field of large language models gives it a higher potential for widespread scientific impact.

vs. Hodoscope: Unsupervised Monitoring for AI Misbehaviors

claude-opus-4.64/28/2026

Paper 1 (neuroGravity) has broader scientific impact due to its wide applicability across urban planning, public health, and socioeconomic analysis for 1,200+ cities worldwide, especially in resource-limited regions. It combines physics-informed deep learning with novel insights on spatial income segregation and transferability, offering both methodological innovation and immediate real-world utility. Paper 2 (Hodoscope) addresses an important AI safety problem with a clever unsupervised monitoring approach, but its impact is more narrowly focused on AI benchmark evaluation. neuroGravity's cross-disciplinary relevance and global humanitarian implications give it higher potential impact.

vs. Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis

claude-opus-4.64/28/2026

Paper 2 presents a novel physics-informed deep learning model with broad real-world applications in urban planning, public health, and development economics across 1,200+ cities worldwide. It offers methodological innovation by combining gravity models with deep learning, produces actionable insights about income segregation and transferability, and addresses critical data gaps in underdeveloped regions. Paper 1 addresses an important but narrower problem of LLM auditability with a relatively simple protocol. While timely, its impact is more procedural than foundational, and the methodology is less technically deep compared to Paper 2's contributions.

vs. HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

gpt-5.24/28/2026

Paper 2 has higher likely impact due to strong novelty and timeliness: it introduces a new benchmark and metric (Ask-F1) targeting a widely observed, under-measured failure mode in frontier agents—selective escalation under ambiguity—and shows cross-domain generality plus trainability via RL. This can directly influence evaluation standards, agent design, and safety/reliability practices across ML and software engineering. Paper 1 is methodologically strong and valuable for urban/public-health applications, but its impact is more domain-specific and incremental relative to existing mobility modeling and transfer learning efforts.

vs. AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot

gemini-34/28/2026

Paper 1 addresses a critical bottleneck in the scientific process itself: peer review. By demonstrating at an unprecedented scale that AI can generate reviews preferred over human ones, it has the potential to fundamentally disrupt and improve how research is evaluated across all scientific disciplines. While Paper 2 presents a highly valuable tool for urban planning and public health, its impact is more domain-specific. Paper 1's broad applicability, timeliness, and potential to reshape scientific publishing give it higher overall scientific impact.

vs. HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

gpt-5.24/28/2026

Paper 2 likely has higher impact: it introduces a broadly applicable benchmark and metric (Ask-F1) targeting a critical, under-measured failure mode in frontier agents—selective escalation under ambiguity—highly timely for deployed AI systems. It demonstrates systematic gaps across domains, provides failure taxonomies, and shows the capability is trainable with transfer, making it both diagnostic and actionable. This can influence evaluation standards, agent training, and safety/reliability practices across many AI subfields. Paper 1 is strong and useful for mobility/urban analytics but is more domain-specific in reach.

vs. IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures

claude-opus-4.64/28/2026

IatroBench addresses a critical and timely problem—AI safety measures causing iatrogenic harm through identity-contingent withholding of medical knowledge. Its pre-registered methodology across 6 frontier models with 3,600 responses provides rigorous evidence of a systematic flaw in AI alignment that affects real patient safety. The finding that safety guardrails paradoxically cause harm to the most vulnerable users (those who've exhausted standard referrals) has immediate implications for AI policy, model training, and healthcare. It exposes fundamental tensions in AI safety design that will shape regulation and development across the industry, giving it exceptionally broad and urgent impact.

vs. Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis

claude-opus-4.64/28/2026

Paper 1 demonstrates broader scientific impact through its novel physics-informed deep learning framework for human mobility reconstruction, with direct applications to urban planning and public health in 1,200+ cities worldwide. It contributes fundamental insights linking spatial income segregation to model transferability, offering both methodological and theoretical advances. Paper 2 addresses an important but narrower methodological concern about LLM prior contamination with a practical protocol. While timely and useful, it is more of an engineering contribution to LLM auditing rather than a fundamental scientific advance, and its impact is largely confined to LLM-assisted analysis workflows.

vs. AI scientists produce results without reasoning scientifically

gpt-5.24/28/2026

Paper 2 likely has higher impact: it introduces a concrete, deployable physics-informed model (neuroGravity) that reconstructs and transfers human mobility networks using widely available inputs, enabling immediate applications in urban planning and public health, especially in data-scarce regions. It adds methodological contributions (transferability analysis, segregation index) and produces a large global dataset (1,200+ cities), broadening downstream use. Paper 1 is timely and important diagnostically, but is primarily evaluative/critical and may yield less direct near-term utility than a scalable modeling tool.