Artificial Intelligence Paper Rankings

AI-estimated scientific impact ranking of the latest arXiv Artificial Intelligence preprints. Methodology New: General Relativity

Sign up for free to unlock all papers &

200papers (412 total)
55874matches
1

Towards a General Intelligence and Interface for Wearable Health Data

Girish Narayanswamy, Maxwell A. Xu +6

1649
47
80.9%
May 21, 2026
2

A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography

Ziqing Yu, Yuhui Tao +6

1595
35
97.1%
May 25, 2026
3

Advancing Mathematics Research with AI-Driven Formal Proof Search

George Tsoukalas, Anton Kovsharov +6

1585
25
88%
May 21, 2026
4

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

Rui Meng, Bhavana Dalvi Mishra +6

1583
31
74.2%
May 25, 2026
5

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

Amartya Roy, Sonali Parbhoo

1581
26
92.3%
May 26, 2026
6

Reward Bias Substitution: Single-Axis Bias Mitigations Redirect Optimization Pressure

Max Lamparth, Daniel Fein +3

1581
18
88.9%
May 27, 2026
7

AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation

Shanghua Gao, Ada Fang +1

1579
27
88.9%
May 27, 2026
8

Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

Mattia J. Villani, Pranav Deshpande +3

1565
22
86.4%
May 27, 2026
9

Calibrating Conservatism for Scalable Oversight

William Overman, Mohsen Bayati

1563
18
83.3%
May 27, 2026
10

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

Shubham Agarwal, Alexander Krentsel +6

1560
21
57.1%
May 22, 2026
11

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

Dongyoon Hahm, Dylan Hadfield-Menell +1

1547
39
74.4%
May 26, 2026
12

When Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models

Dasol Choi, Alex Kwon

1544
38
84.2%
May 27, 2026
13

Inference Time Context Sparsity: Illusion or Opportunity?

Sahil Joshi, Prithvi Dixit +6

1544
24
83.3%
May 22, 2026
14

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Yifan Yang, Ziyang Gong +6

v2
1541
26
80.8%
May 22, 2026
15

SIA: Self Improving AI with Harness & Weight Updates

Prannay Hebbar, Yogendra Manawat +5

1540
18
83.3%
May 26, 2026
16

Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs

Zhe Yu, Wenpeng Xing +5

1539
28
85.7%
May 26, 2026
17

Neuro-Inspired Inverse Learning for Planning and Control

Maryna Kapitonova, Tonio Ball

1538
19
73.7%
May 22, 2026
18

Learning to Search and Searching to Learn for Generalization in Planning

Michael Aichmüller, Yannik Hesse +1

1532
29
86.2%
May 25, 2026
19

Human-like in-group bias in instruction-tuned language model agents

Messi H. J. Lee

1529
20
85%
May 27, 2026
20

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Bowen Wang, Dunjie Lu +6

1528
22
86.4%
May 25, 2026
21

RULER: Representation-Level Verification of Machine Unlearning

Georgina Cosma, Axel Finke

1526
24
79.2%
May 26, 2026
22

What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation

Xiang Wang, Wei Wei

1523
24
83.3%
May 26, 2026
23

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

Jingchu Gai, Guanning Zeng +5

1522
24
83.3%
May 23, 2026
24

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

Zhe Yu, Wenpeng Xing +5

1521
18
77.8%
May 26, 2026
25

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

Caijun Xu, Changyi Xiao +2

1519
21
76.2%
May 27, 2026
26

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Ali Hatamizadeh, Yejin Choi +1

1518
21
81%
May 21, 2026
27

Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling

Dao Tran, Duc Anh Le +4

1518
17
70.6%
May 24, 2026
28

Fundamental Limitation in Explaining AI

Atsushi Suzuki, Jing Wang

1516
22
81.8%
May 23, 2026
29

Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

Nick Merrill, Jaeho Lee +1

v2
1515
17
88.2%
May 21, 2026
30

Forecasting Scientific Progress with Artificial Intelligence

Sean Wu, Pan Lu +6

1515
32
37.5%
May 21, 2026
31

Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory

Abdelghny Orogat, Essam Mansour

1514
20
80%
May 25, 2026
32

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

MiniMax, : +6

1509
21
76.2%
May 26, 2026
33

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems

Jianing Zhu, Yeonju Ro +6

1509
20
80%
May 25, 2026
34

Voluntary Collusion with Secret Tools in Competing LLM Agents

Xijie Zeng, Frank Rudzicz

1509
29
86.2%
May 26, 2026
35

LACUNA: Safe Agents as Recursive Program Holes

Yaoyu Zhao, Yichen Xu +4

1508
24
83.3%
May 27, 2026
36

Better Accuracies, Worse Reasoning: A Step-Level Audit of Medical Chain-of-Thought Distillation

Zhaoyang Jiang, Xuanqi Peng +6

1504
16
81.2%
May 27, 2026
37

AIBuildAI-2: A Knowledge-Enhanced Agent for Automatically Building AI Models

Ruiyi Zhang, Peijia Qin +3

1504
24
83.3%
May 27, 2026
38

A governance horizon for ethical-use constraints in open-weight AI models

Weiwei Xu, Hengzhi Ye +4

1502
24
70.8%
May 23, 2026
39

Credit Assignment with Resets in Language Model Reasoning

Ankur Samanta, Akshayaa Magesh +6

v2
1502
25
64%
May 25, 2026
40

PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design

Manpreet Kaur, Xingying Zhang +1

1497
30
73.3%
May 26, 2026
41

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

Yiqun Chen, Wei Yang +6

1497
23
73.9%
May 26, 2026
42

LipoAgent: Coordinating Fine-Tuned LLM Agents for Safer Lipid Design

Leshu Li, An Lu +6

1497
23
73.9%
May 24, 2026
43

CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models

Abhilash Durgam, Nyle Siddiqui +4

1493
20
85%
May 27, 2026
44

Learning to Reason Efficiently with A* Post-Training

Andreas Opedal, Francesco Ignazio Re +4

1493
25
68%
May 23, 2026
45

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Kohsei Matsutani, Gouki Minegishi +3

1488
20
75%
May 27, 2026
46

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Linas Nasvytis, Simon Jerome Han +4

1488
17
76.5%
May 27, 2026
47

Inverting the Shield: Systematically Generating Safety Tests from Policy Specifications

Xiaoyue Lu, Xianglin Yang +5

1487
19
57.9%
May 24, 2026
48

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

Zisu Huang, Jingwen Xu +6

1487
25
72%
May 22, 2026
49

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

Qianshu Cai, Yonggang Zhang +5

1487
18
66.7%
May 21, 2026
50

AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems

Michael Hardy, Anka Reuel +6

1486
21
71.4%
May 24, 2026
51

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

Yamato Arai, Yuma Ichikawa

1486
27
74.1%
May 21, 2026
52

Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning

Zhe Yu, Wenpeng Xing +5

1485
27
66.7%
May 26, 2026
53

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

HuiMing Fan, Xiao Wang +6

1484
14
64.3%
May 27, 2026
54

SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models

Chao Ding, Mouxiao Bian +6

1483
18
77.8%
May 27, 2026
55

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay

Zhexin Hu, Li Wang +5

1482
19
73.7%
May 27, 2026
56

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Junlin Yang, Dylan Zhang +6

1482
18
72.2%
May 25, 2026
57

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

Xiaohua Wang, Jiakang Yuan +6

1481
21
66.7%
May 26, 2026
58

OpenURMA: A Clean-Room Open Implementation of the Unified Bus Protocol

Bojie Li

1481
16
75%
May 27, 2026
59

Plant, Persist, Trigger: Sleeper Attack on Large Language Model Agents

Yongxiang Li, Moxin Li +5

1480
17
76.5%
May 27, 2026
60

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal

Kia-Jüng Yang, Dominik Meier +3

1480
21
71.4%
May 26, 2026
61

Hera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM Agents

Yuxin Zhang, Mengxue Hu +6

1479
21
66.7%
May 23, 2026
62

SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

Yuyang Hu, Hongjin Qian +6

1479
22
72.7%
May 23, 2026
63

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

Dongxin Guo

1478
24
75%
May 21, 2026
64

Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor

Guoxin Ma, Yibing Liu +6

1477
18
77.8%
May 27, 2026
65

Proper Scoring Rules for Agentic Uncertainty Quantification

Suresh Raghu, Satwik Pandey +1

1475
19
73.7%
May 23, 2026
66

Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information

Renjie Gu, Jiaxu Li +6

1475
15
66.7%
May 27, 2026
67

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

Gabriele Cesa, Thomas Hehn +5

1474
16
62.5%
May 26, 2026
68

StructBreak: Structural Cognitive Overload-Induced Safety Failures in MLLMs

Yang Luo, Xinran Liu +4

1473
21
76.2%
May 25, 2026
69

Behavioural Analysis of Alignment Faking

Nathaniel Mitrani Hadida, Rhea Karty +2

1471
11
54.5%
May 26, 2026
70

From Fact Overwriting to Knowledge Evolution: Causal Editing via On-Policy Self-Distillation

Shuaike Li, Kai Zhang +3

1471
20
70%
May 27, 2026
71

Advancing Creative Physical Intelligence in Large Multimodal Models

Cheng Qian, Hyeonjeong Ha +6

1471
22
72.7%
May 25, 2026
72

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

Zhikai Pan, Chih-Ting Liao +6

1469
17
76.5%
May 27, 2026
73

HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models

Yuyu Liu, Haotian Xu +4

1468
20
60%
May 22, 2026
74

PALoRA: Projection-Adaptive LoRA for Preserving Reasoning in Large Language Models

Mustafa Hayri Bilgin, Mariam Barry +3

1468
21
66.7%
May 23, 2026
75

Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows

Yilun Yao, Xinyu Tan +6

1466
13
69.2%
May 27, 2026
76

Position: AI Safety Requires Effective Controllability

Yige Li, Yunhao Feng +1

1465
19
63.2%
May 26, 2026
77

Hypothesis Generation and Inductive Inference in Children and Language Models

Jeffrey Qin, Wasu Top Piriyakulki +5

1465
15
60%
May 23, 2026
78

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning

Chen Linze, Cai Yufan +2

1464
19
57.9%
May 26, 2026
79

MemCog: From Memory-as-Tool to Memory-as-Cognition in Conversational Agents

Zihan Li, Xingyu Fan +2

1463
15
73.3%
May 27, 2026
80

PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

Yuchen Guo, Junli Gong +3

1462
20
70%
May 26, 2026
81

A Unified Framework for the Evaluation of LLM Agentic Capabilities

Pengyu Zhu, Lijun Li +6

1461
19
68.4%
May 27, 2026
82

Can LLMs Introspect? A Reality Check

Shashwat Singh, Tal Linzen +1

1461
20
65%
May 25, 2026
83

AMEL: Accumulated Message Effects on LLM Judgments

Sid-ali Temkit

1461
23
73.9%
May 21, 2026
84

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Aman Priyanshu, Supriti Vijay +1

1461
16
75%
May 26, 2026
85

GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models

Vartan Shadarevian, Kia Ghods +2

1461
26
57.7%
May 22, 2026
86

Continual Model Routing in Evolving Model Hubs

Jack Bell, Giacomo Carfì +2

1461
16
68.8%
May 27, 2026
87

Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

Simon Dennis, Rivaan Patil +2

1459
22
68.2%
May 21, 2026
88

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

Xu Shen, Zhen Tan +5

1459
24
70.8%
May 25, 2026
89

Do Clinical Models Change Treatment Decisions?

Dongkyu Cho, Miao Zhang +1

1459
17
64.7%
May 27, 2026
90

AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning

Yuyang Hu, Hongjin Qian +6

1459
24
75%
May 23, 2026
91

How Well Do Models Follow Their Constitutions?

Arya Jakkli, Senthooran Rajamanoharan +1

1458
24
75%
May 22, 2026
92

Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction

João Sedoc, Baotong Zhang +1

1456
22
68.2%
May 24, 2026
93

A Policy-Driven Runtime Layer for Agentic LLM Serving

Rui Zhang, Chaeeun Kim +1

1456
17
70.6%
May 26, 2026
94

From Accounting to Coordination: A Virtual Water-Aware Electricity-Computation-Water Nexus Framework for Data Center Dispatch

Haiyang You, Chengwei Lou +4

1456
20
65%
May 25, 2026
95

Emotional intelligence in large language models is fragmented across perception, cognition, and interaction

Minghao Lv, Lu Chen +6

1454
23
60.9%
May 23, 2026
96

Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations

Simardeep Singh, Paras Chopra

1454
16
68.8%
May 27, 2026
97

MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation

Xiaoyu Dong, Zhi Li +1

1454
16
68.8%
May 27, 2026
98

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

Dingbang Wu, Rui Hao +6

1453
19
73.7%
May 25, 2026
99

GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration

Junjie Zhao, Jingyi Liang +6

1453
19
52.6%
May 23, 2026
100

Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs

Yue Cheng, Jiajun Zhang +4

1452
15
73.3%
May 27, 2026
101

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

Haoran Zhang, Zhaohua Sun

1452
31
51.6%
May 26, 2026
102

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Zihan Liang, Yufei Ma +5

1450
21
66.7%
May 21, 2026
103

TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning

Chusen Li, Zhou Liu +2

1450
22
72.7%
May 27, 2026
104

PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning

Lingyu Jiang, Zirui Li +6

1449
19
57.9%
May 21, 2026
105

EgoBench: An Interactive Egocentric Multimodal Benchmark for Tool-Using Agents

Yunqi Liu, Tong Niu +5

1446
16
62.5%
May 27, 2026
106

Reasoning Matters: Mitigate Hallucination in Multimodal Large Reasoning Models via Reasoning-Conditioned Preference Optimization

Jiawei Kong, Hao Fang +6

1446
15
66.7%
May 27, 2026
107

Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

Andrii Kryshtal

1446
23
69.6%
May 21, 2026
108

Palette: A Modular, Controllable, and Efficient Framework for On-demand Authorized Safety Alignment Relaxation in LLMs

Qitao Tan, Xiaoying Song +6

1445
22
45.5%
May 22, 2026
109

JobBench: Aligning Agent Work With Human Will

Yuetai Li, Yichen Feng +6

1445
19
68.4%
May 25, 2026
110

CODESKILL: Learning Self-Evolving Skills for Coding Agents

Yanzhou Li, Yiran Zhang +3

1444
18
66.7%
May 25, 2026
111

A Sober Look at Agentic Misalignment in Automated Workflows

Wenqian Ye, Bo Yuan +5

1444
20
65%
May 22, 2026
112

Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents

Hao-Hsuan Chen

1444
21
66.7%
May 25, 2026
113

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability

Leizhen Zhang, Shuhan Chen +1

1443
14
64.3%
May 27, 2026
114

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

Yifan Zeng, Yiran Wu +5

1443
20
55%
May 22, 2026
115

Verifiable Benchmarking of Long-Horizon Spatial Biology

Ian Diks, Harihara Muralidharan +2

1442
19
63.2%
May 27, 2026
116

Anchor: Mitigating Artifact Drift in Agent Benchmark Generation

Maksim Ivanov, Abhijay Rana

1442
21
61.9%
May 25, 2026
117

ConceptM3^3oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology

Xuan Wang, Zhongling Xu +6

1442
18
66.7%
May 23, 2026
118

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Jeongeun Lee, Chanyoung Park +1

1440
23
69.6%
May 25, 2026
119

Retrying vs Resampling in AI Control

James Lucassen, Adam Kaufman

v2
1440
24
58.3%
May 25, 2026
120

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

Shangding Gu

1438
21
61.9%
May 25, 2026
121

Clustering as Reasoning: A kk-Means Interpretation of Chain-of-Thought Graph Learning

Xuanting Xie, Zhaochen Guo +5

1438
19
63.2%
May 24, 2026
122

On the Origin of Synthetic Information by Means of Steganographic Inheritance

Ching-Chun Chang, Isao Echizen

1436
18
61.1%
May 26, 2026
123

Multi-Adapter Representation Interventions via Energy Calibration

Manjiang Yu, Hongji Li +5

1436
15
66.7%
May 27, 2026
124

Meta-Agent: From Task Descriptions to Verified Multi-Agent Systems

Andy Xu, Yu-Wing Tai

1433
20
60%
May 24, 2026
125

Automatic Layer Selection for Hallucination Detection

Xinpeng Wang, William Cao +2

1433
26
65.4%
May 25, 2026
126

Can Broad Biomedical Knowledge be Contextualized into Scenario-Grounded Propositions?

Qingyuan Zeng, Ziyang Chen +6

1433
27
63%
May 26, 2026
127

MemFail: Stress-Testing Failure Modes of LLM Memory Systems

Ishir Garg, Neel Kolhe +2

1432
21
52.4%
May 26, 2026
128

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

Sadia Asif, Mohammad Mohammadi Amiri +3

1432
18
55.6%
May 21, 2026
129

TIGER: Text-Informed Generalized Enzyme-Reaction Retrieval

Yuhang Zhang, Keyan Ding +6

1431
22
50%
May 23, 2026
130

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Yuxin Chen, Yi Zhang +6

1431
23
60.9%
May 26, 2026
131

Measuring Progress Toward AGI: A Cognitive Framework

Ryan Burnell, Yumeya Yamamori +6

1430
13
53.8%
May 27, 2026
132

Back to Parsimonious Latents: Learning Task-Centric World Models from Visual Foundations

Minghao Fu, Fan Feng +2

1430
19
52.6%
May 25, 2026
133

Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG

Pin Qian, Su Wang +6

1429
20
65%
May 27, 2026
134

AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

Guiyao Tie, Jiawen Shi +6

1429
26
53.8%
May 22, 2026
135

Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns

Guni Sharon

1428
17
52.9%
May 27, 2026
136

Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning

Zhenyu Cui, Xiangzhong Luo

1427
15
60%
May 27, 2026
137

FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

Minwei Kong, Chonghe Jiang +6

1427
22
54.5%
May 24, 2026
138

Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models

Seokil Ham, Jaehyuk Jang +2

1426
19
52.6%
May 23, 2026
139

Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning

Yang Zhang, Xiaoshuai Sun +6

1426
13
53.8%
May 27, 2026
140

MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning

Aritra Dutta, Somak Aditya

1425
21
52.4%
May 25, 2026
141

Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration

Yuanzhi Xu, Qian Gao +5

1425
18
50%
May 24, 2026
142

MolLingo: Molecule-Native Representations for LLM-Powered Scientific Agents

Thao Nguyen, Heng Ji

1424
17
70.6%
May 27, 2026
143

Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents

Yunhui Gan, Tan Pan +6

1423
19
52.6%
May 26, 2026
144

Test-Time Deep Thinking to Explore Implicit Rules

Wentong Chen, Xin Cong +6

1423
21
57.1%
May 24, 2026
145

Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement

Jyotirmoy Nath, Neeraj Kumar +1

1423
16
75%
May 27, 2026
146

Energy Shields for Fairness

Filip Cano, Thomas A. Henzinger +1

1421
22
59.1%
May 24, 2026
147

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Yubo Li, Yidi Miao +2

1421
17
58.8%
May 24, 2026
148

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

Matteo Gioele Collu, Riccardo Conte +5

1420
16
43.8%
May 27, 2026
149

Uncertainty Reasoning with Large Language Models for Explainable Disease Diagnosis

Xiaoyang Fan, Yufan Cai +2

1420
20
40%
May 25, 2026
150

Plan Before Search: Search Agents Need Plan

Zhipeng Qian, Zihan Liang +6

1420
19
57.9%
May 27, 2026
151

VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora

Yuting Xu, Jiayi Tian +5

1419
13
69.2%
May 27, 2026
152

Localizing Input Uncertainty Quantification for Large Language Models via Shapley Values

Seongjun Lee, Suwan Yoon +1

1419
19
63.2%
May 27, 2026
153

Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR

Soeun Kim, Albert No

1419
15
66.7%
May 27, 2026
154

MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

Yuhao Shen, Lang Cao +5

1418
26
57.7%
May 26, 2026
155

Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation

Heng Qu, Yike Liu +5

1416
21
57.1%
May 26, 2026
156

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Harshada Badave, Santosh Borse +6

1415
17
52.9%
May 22, 2026
157

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Yoosung Hong

1414
25
52%
May 22, 2026
158

EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation

Aristotelis Lazaridis, Dylan Bates +4

1414
21
57.1%
May 22, 2026
159

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

Bowen Wei, Nan Wang +3

1414
15
60%
May 27, 2026
160

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

Shuoming Zhang, Qiuchu Yu +6

1413
18
61.1%
May 27, 2026
161

Lattice theory and algebraic models for deep convolutional learning based on mathematical morphology

Gustavo, Angulo

1413
18
55.6%
May 23, 2026
162

You Live More Than Once: Towards Hierarchical Skill Meta-Evolving

Xujun Li, Kehan Zheng +6

1413
14
57.1%
May 27, 2026
163

Revealing Algorithmic Deductive Circuits for Logical Reasoning

Phuong Minh Nguyen, Tien Huu Dang +1

1413
20
55%
May 27, 2026
164

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Yansong Ning, Mianpeng Liu +3

1412
20
65%
May 27, 2026
165

Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering

Mateusz Czyżnikiewicz, Ryszard Tuora +6

1411
16
56.2%
May 26, 2026
166

MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection

Zhewen Tan, Yilun Yao +6

1411
21
61.9%
May 22, 2026
167

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

Yingtie Lei, Zhongwei Wan +6

1410
23
47.8%
May 22, 2026
168

BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models

Fei Deng, Yanwu Xu +5

1409
16
62.5%
May 27, 2026
169

Boosting Inference with Guided Reasoning: Stochastic Exploration for Recursive Models

Andrew Corbett, Archit Sood +3

1409
19
63.2%
May 24, 2026
170

Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework

Ali Şenol, Garima Agrawal +1

1408
22
63.6%
May 23, 2026
171

NeurIPS: Neuro-anatomical Inductive Priors for Sphere-based Brain Decoding

Sijin Yu, Zijiao Chen +6

1407
21
52.4%
May 24, 2026
172

Design and Report Benchmarks for Knowledge Work

Yining Hua, Hongbin Na +2

1406
22
54.5%
May 22, 2026
173

Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent Collaboration Network

Qiming Ye, Peixain Zhang +3

v2
1406
19
52.6%
May 25, 2026
174

Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning

Yi Wang, Haojie Lu +3

1406
15
60%
May 27, 2026
175

Representation Without Control: Testing the Realization Effect in Language Models

Ciarán Walsh, Emilio Barkett

1406
19
42.1%
May 24, 2026
176

Bandwidth-Efficient and Privacy-Preserving Edge-Cloud Many-to-Many Speech Translation

Yexing Du, Kaiyuan Liu +5

1405
16
56.2%
May 27, 2026
177

Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments

Yuxin Chen, Xiaodong Cai +6

1405
17
58.8%
May 26, 2026
178

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

Hankyeol Kim, Pilsung Kang

1404
13
61.5%
May 26, 2026
179

DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution

Yunhai Hu, Zining Liu +6

1404
18
66.7%
May 27, 2026
180

Towards end-to-end LLM-based censoring-aware survival analysis

Yishu Wei, Hexin Dong +4

1404
21
57.1%
May 25, 2026
181

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Banghao Chi, Yining Xie +6

1404
25
52%
May 21, 2026
182

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

Adam Bawatneh, Sagar Sapkota +3

1403
26
42.3%
May 25, 2026
183

D2D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Aoxi Liu, Yupeng Chen +6

1402
21
61.9%
May 25, 2026
184

Generating Robust Portfolios of Optimization Models using Large Language Models

Eleni Straitouri, Cheol Woo Kim +1

1402
24
50%
May 26, 2026
185

Agentic Proving for Program Verification

Alessandro Sosso, Akhil Arora +1

1402
31
58.1%
May 22, 2026
186

When Mean CE Fails: Median CE Can Better Track Language Model Quality

Hao Guo, Simon Dennis +2

1401
19
57.9%
May 23, 2026
187

Whose Alignment? Comparing LLM Process Alignment Across Diverse Organizational Decision Contexts

Niklas Weller, Emilio Barkett

1400
20
40%
May 24, 2026
188

SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver

Rongsheng Chen, Changliang Zhou +5

1400
24
58.3%
May 23, 2026
189

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Yusong Lin, Xinyuan Liang +6

1400
19
52.6%
May 25, 2026
190

Beyond Inference-Only Deployment: Comparing Weight-Based Consolidation Against Cascading Compaction

Simon Dennis, Kevin Shabahang +2

1400
19
42.1%
May 23, 2026
191

DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs

Yi Li, Songtao Wei +4

1399
23
47.8%
May 24, 2026
192

Towards Faithful Agentic XAI: A Verification Method and an Open-World Benchmark for Better Model Faithfulness

Jaechang Kim, Sunung Mun +3

1398
14
50%
May 27, 2026
193

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

Jiazheng Kang, Bowen Zhang +5

1397
24
50%
May 22, 2026
194

AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions

Jingwei Sun, Jianing Zhu +4

1397
20
55%
May 25, 2026
195

Natural Language Query to Configuration for Retrieval Agents

Melissa Z. Pan, Negar Arabzadeh +4

1397
25
52%
May 26, 2026
196

AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters

Hanjun Luo, Zhimu Huang +6

1397
24
45.8%
May 21, 2026
197

JT-SAFE-V2: Safety-by-Design Foundation Model with World-Context Data

Junlan Feng, Fanyu Meng +6

1396
25
48%
May 23, 2026
198

A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test

Camilo Chacón Sartori, José H. García

1396
14
50%
May 27, 2026
199

StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning

Yanfei Zhang, Xu Lin +1

1396
26
53.8%
May 26, 2026
200

DART: Semantic Recoverability for Structured Tool Agents

Ke Yang, Panpan Li +4

1395
27
51.9%
May 22, 2026
Win-rate scores from pairwise comparisons with 95% confidence intervals. Papers compared using full-text deep analysis.