Artificial Intelligence Paper Rankings

AI-estimated scientific impact ranking of the latest arXiv Artificial Intelligence preprints. Methodology New: General Relativity

Sign up for free to unlock all papers &

200papers (404 total)
57454matches
1

A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography

Ziqing Yu, Yuhui Tao +6

1604
36
97.2%
May 25, 2026
2

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Adly Templeton, Tom Conerly +6

1595
28
96.4%
May 28, 2026
3

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

Rui Meng, Bhavana Dalvi Mishra +6

1583
31
74.2%
May 25, 2026
4

AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation

Shanghua Gao, Ada Fang +1

1583
29
86.2%
May 27, 2026
5

Reward Bias Substitution: Single-Axis Bias Mitigations Redirect Optimization Pressure

Max Lamparth, Daniel Fein +3

1577
19
84.2%
May 27, 2026
6

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

Amartya Roy, Sonali Parbhoo

1576
27
88.9%
May 26, 2026
7

Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

Mattia J. Villani, Pranav Deshpande +3

1573
23
87%
May 27, 2026
8

Calibrating Conservatism for Scalable Oversight

William Overman, Mohsen Bayati

1563
18
83.3%
May 27, 2026
9

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

Dongyoon Hahm, Dylan Hadfield-Menell +1

1547
39
74.4%
May 26, 2026
10

Formalizing Mathematics at Scale

Ahmad Rammal, Niket Patel +6

1543
22
86.4%
May 28, 2026
11

When Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models

Dasol Choi, Alex Kwon

1541
41
80.5%
May 27, 2026
12

RULER: Representation-Level Verification of Machine Unlearning

Georgina Cosma, Axel Finke

1541
26
80.8%
May 26, 2026
13

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

Caijun Xu, Changyi Xiao +2

1541
24
79.2%
May 27, 2026
14

SIA: Self Improving AI with Harness & Weight Updates

Prannay Hebbar, Yogendra Manawat +5

1540
18
83.3%
May 26, 2026
15

Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs

Zhe Yu, Wenpeng Xing +5

1539
28
85.7%
May 26, 2026
16

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Bowen Wang, Dunjie Lu +6

1537
23
87%
May 25, 2026
17

What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation

Xiang Wang, Wei Wei

1532
25
84%
May 26, 2026
18

Learning to Search and Searching to Learn for Generalization in Planning

Michael Aichmüller, Yannik Hesse +1

1532
29
86.2%
May 25, 2026
19

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

Elliot Gestrin, Jendrik Seipp

1529
21
85.7%
May 28, 2026
20

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Yubo Li, Ramayya Krishnan +1

1525
26
84.6%
May 27, 2026
21

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

Zhe Yu, Wenpeng Xing +5

1521
18
77.8%
May 26, 2026
22

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems

Jianing Zhu, Yeonju Ro +6

1518
21
81%
May 25, 2026
23

GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents

Johannes Moll, Jean-Philippe Corbeil +5

1516
18
77.8%
May 28, 2026
24

Human-like in-group bias in instruction-tuned language model agents

Messi H. J. Lee

1514
23
73.9%
May 27, 2026
25

Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory

Abdelghny Orogat, Essam Mansour

1514
20
80%
May 25, 2026
26

The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF

Zeli Su, Zhankai Xu +5

1513
33
87.9%
May 28, 2026
27

Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion

Yizhuo Lu, Changde Du +6

1511
14
78.6%
May 28, 2026
28

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

MiniMax, : +6

1509
21
76.2%
May 26, 2026
29

Voluntary Collusion with Secret Tools in Competing LLM Agents

Xijie Zeng, Frank Rudzicz

1508
32
81.2%
May 26, 2026
30

PassNet: Scaling Large Language Models for Graph Compiler Pass Generation

Yiqun Liu, Yingsheng Wu +6

1505
15
86.7%
May 28, 2026
31

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Linas Nasvytis, Simon Jerome Han +4

1505
19
78.9%
May 27, 2026
32

AIBuildAI-2: A Knowledge-Enhanced Agent for Automatically Building AI Models

Ruiyi Zhang, Peijia Qin +3

1504
24
83.3%
May 27, 2026
33

Credit Assignment with Resets in Language Model Reasoning

Ankur Samanta, Akshayaa Magesh +6

v2
1502
25
64%
May 25, 2026
34

LACUNA: Safe Agents as Recursive Program Holes

Yaoyu Zhao, Yichen Xu +4

1501
26
76.9%
May 27, 2026
35

When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop

Yang Zhang, Xiukun Wei +1

1499
20
85%
May 28, 2026
36

Quantifying and Optimizing Simplicity via Polynomial Representations

Tianren Zhang, Xiangxin Li +3

1499
19
84.2%
May 28, 2026
37

Battery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter Estimation

Jiawei Chen, Xiaofan Gui +5

1499
27
81.5%
May 28, 2026
38

PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design

Manpreet Kaur, Xingying Zhang +1

1497
30
73.3%
May 26, 2026
39

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

Yiqun Chen, Wei Yang +6

1497
23
73.9%
May 26, 2026
40

Better Accuracies, Worse Reasoning: A Step-Level Audit of Medical Chain-of-Thought Distillation

Zhaoyang Jiang, Xuanqi Peng +6

1495
18
72.2%
May 27, 2026
41

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Kohsei Matsutani, Gouki Minegishi +3

1495
24
70.8%
May 27, 2026
42

Plant, Persist, Trigger: Sleeper Attack on Large Language Model Agents

Yongxiang Li, Moxin Li +5

1494
20
75%
May 27, 2026
43

Robust and Efficient Guardrails with Latent Reasoning

Siddharth Sai, Xiaofei Wen +1

1494
17
70.6%
May 27, 2026
44

CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models

Abhilash Durgam, Nyle Siddiqui +4

1493
20
85%
May 27, 2026
45

OpenURMA: A Clean-Room Open Implementation of the Unified Bus Protocol

Bojie Li

1493
19
73.7%
May 27, 2026
46

MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

Tianzhuo Yang, Zihan Shen +6

1490
19
84.2%
May 28, 2026
47

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Aman Priyanshu, Supriti Vijay +1

1488
21
76.2%
May 26, 2026
48

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

Gabriele Cesa, Thomas Hehn +5

1486
19
63.2%
May 26, 2026
49

AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems

Michael Hardy, Anka Reuel +6

1486
21
71.4%
May 24, 2026
50

Demystifying Data Organization for Enhanced LLM Training

Yalun Dai, Yangyu Huang +6

1485
17
76.5%
May 28, 2026
51

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

Ziyan Liu, Zhezheng Hao +6

1485
16
68.8%
May 28, 2026
52

Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning

Zhe Yu, Wenpeng Xing +5

1485
27
66.7%
May 26, 2026
53

Orthogonal Concept Erasure for Diffusion Models

Yuhao Sun, Lingyun Yu +4

1485
15
73.3%
May 27, 2026
54

DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning

Yang He, Xiao Ding +6

1484
16
75%
May 28, 2026
55

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

Zhikai Pan, Chih-Ting Liao +6

1484
22
72.7%
May 27, 2026
56

Behavioural Analysis of Alignment Faking

Nathaniel Mitrani Hadida, Rhea Karty +2

1483
14
57.1%
May 26, 2026
57

Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor

Guoxin Ma, Yibing Liu +6

1483
24
66.7%
May 27, 2026
58

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

HuiMing Fan, Xiao Wang +6

1482
18
55.6%
May 27, 2026
59

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Junlin Yang, Dylan Zhang +6

1482
18
72.2%
May 25, 2026
60

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

Xiaohua Wang, Jiakang Yuan +6

1481
21
66.7%
May 26, 2026
61

Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows

Yilun Yao, Xinyu Tan +6

1481
16
68.8%
May 27, 2026
62

Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information

Renjie Gu, Jiaxu Li +6

1480
17
64.7%
May 27, 2026
63

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal

Kia-Jüng Yang, Dominik Meier +3

1480
21
71.4%
May 26, 2026
64

Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent

Yiming Liu, Bin Lu +6

1478
14
78.6%
May 28, 2026
65

ReasonOps: Operator Segmentation for LLM Reasoning Traces

Daniel Lee, Owen Queen +1

1478
24
79.2%
May 28, 2026
66

SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models

Chao Ding, Mouxiao Bian +6

1475
20
70%
May 27, 2026
67

A Unified Framework for the Evaluation of LLM Agentic Capabilities

Pengyu Zhu, Lijun Li +6

1474
22
68.2%
May 27, 2026
68

MemCog: From Memory-as-Tool to Memory-as-Cognition in Conversational Agents

Zihan Li, Xingyu Fan +2

1474
21
66.7%
May 27, 2026
69

Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs

Yue Cheng, Jiajun Zhang +4

1473
20
75%
May 27, 2026
70

StructBreak: Structural Cognitive Overload-Induced Safety Failures in MLLMs

Yang Luo, Xinran Liu +4

1473
21
76.2%
May 25, 2026
71

Provably Secure Agent Guardrail

Benlong Wu, Weiming Zhang +3

1473
20
85%
May 28, 2026
72

MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation

Xiaoyu Dong, Zhi Li +1

1473
20
70%
May 27, 2026
73

Advancing Creative Physical Intelligence in Large Multimodal Models

Cheng Qian, Hyeonjeong Ha +6

1471
22
72.7%
May 25, 2026
74

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay

Zhexin Hu, Li Wang +5

1470
22
63.6%
May 27, 2026
75

On the Origin of Synthetic Information by Means of Steganographic Inheritance

Ching-Chun Chang, Isao Echizen

1469
24
66.7%
May 26, 2026
76

Continual Model Routing in Evolving Model Hubs

Jack Bell, Giacomo Carfì +2

1469
20
65%
May 27, 2026
77

PRO-CUA: Process-Reward Optimization for Computer Use Agents

Yifei He, Rui Yang +3

1465
15
60%
May 27, 2026
78

Position: AI Safety Requires Effective Controllability

Yige Li, Yunhao Feng +1

1465
19
63.2%
May 26, 2026
79

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning

Chen Linze, Cai Yufan +2

1464
19
57.9%
May 26, 2026
80

MolLingo: Molecule-Native Representations for LLM-Powered Scientific Agents

Thao Nguyen, Heng Ji

1462
23
78.3%
May 27, 2026
81

Can LLMs Introspect? A Reality Check

Shashwat Singh, Tal Linzen +1

1461
20
65%
May 25, 2026
82

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

Xu Shen, Zhen Tan +5

1459
24
70.8%
May 25, 2026
83

Do Clinical Models Change Treatment Decisions?

Dongkyu Cho, Miao Zhang +1

1459
17
64.7%
May 27, 2026
84

Localizing Input Uncertainty Quantification for Large Language Models via Shapley Values

Seongjun Lee, Suwan Yoon +1

1458
24
70.8%
May 27, 2026
85

From Fact Overwriting to Knowledge Evolution: Causal Editing via On-Policy Self-Distillation

Shuaike Li, Kai Zhang +3

1458
23
60.9%
May 27, 2026
86

EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics

Zhichen Tang, Zhengzheng Dang +4

1458
21
71.4%
May 28, 2026
87

Croissant Tasks: A Metadata Format for Reproducible Machine Learning Evaluations

Omar Benjelloun, Leonardo Martins Bianco +6

1458
14
57.1%
May 28, 2026
88

Planning with the Views via Scene Self-Exploration

Kangrui Wang, Linjie Li +6

1457
14
71.4%
May 28, 2026
89

SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations

Qinpei Luo, Ruichun Ma +2

1456
17
82.4%
May 28, 2026
90

Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations

Simardeep Singh, Paras Chopra

1456
21
61.9%
May 27, 2026
91

From Accounting to Coordination: A Virtual Water-Aware Electricity-Computation-Water Nexus Framework for Data Center Dispatch

Haiyang You, Chengwei Lou +4

1456
20
65%
May 25, 2026
92

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

Dingbang Wu, Rui Hao +6

1453
19
73.7%
May 25, 2026
93

Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization

Zhihao Liu, Yifan Wu +4

1453
15
60%
May 28, 2026
94

Conformal Certification of Reasoning Trace Prefixes

Matt Y. Cheung, Ashok Veeraraghavan +2

1452
18
72.2%
May 28, 2026
95

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

Haoran Zhang, Zhaohua Sun

1452
31
51.6%
May 26, 2026
96

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

Shreyas Fadnavis, Praitayini Kanakaraj +1

1451
18
66.7%
May 27, 2026
97

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Tong Ye, Hang Yu +6

1451
18
77.8%
May 28, 2026
98

PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

Yuchen Guo, Junli Gong +3

1450
23
60.9%
May 26, 2026
99

Reasoning Matters: Mitigate Hallucination in Multimodal Large Reasoning Models via Reasoning-Conditioned Preference Optimization

Jiawei Kong, Hao Fang +6

1449
20
60%
May 27, 2026
100

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

Shuoming Zhang, Qiuchu Yu +6

1445
24
66.7%
May 27, 2026
101

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents

Anany Kotawala

1445
21
61.9%
May 28, 2026
102

TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning

Chusen Li, Zhou Liu +2

1445
29
62.1%
May 27, 2026
103

JobBench: Aligning Agent Work With Human Will

Yuetai Li, Yichen Feng +6

1445
19
68.4%
May 25, 2026
104

CODESKILL: Learning Self-Evolving Skills for Coding Agents

Yanzhou Li, Yiran Zhang +3

1444
18
66.7%
May 25, 2026
105

Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents

Hao-Hsuan Chen

1444
21
66.7%
May 25, 2026
106

Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale

Canran Wang, Yuwen Yang +6

1443
15
66.7%
May 28, 2026
107

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Zixuan Jiang, Yanqiao Zhu +6

1443
20
65%
May 28, 2026
108

KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning

Kun Feng, Ziwei Shan +6

1442
16
62.5%
May 28, 2026
109

Verifiable Benchmarking of Long-Horizon Spatial Biology

Ian Diks, Harihara Muralidharan +2

1442
19
63.2%
May 27, 2026
110

Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning

Yang Zhang, Xiaoshuai Sun +6

1442
15
60%
May 27, 2026
111

NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs

Shuaidi Wang, Zhan Zhuang +2

1442
14
71.4%
May 28, 2026
112

Anchor: Mitigating Artifact Drift in Agent Benchmark Generation

Maksim Ivanov, Abhijay Rana

1442
21
61.9%
May 25, 2026
113

Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement

Jyotirmoy Nath, Neeraj Kumar +1

1442
20
75%
May 27, 2026
114

Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk

Tim Woydt, Paul-David Zuercher

1442
19
68.4%
May 28, 2026
115

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Jeongeun Lee, Chanyoung Park +1

1440
23
69.6%
May 25, 2026
116

Retrying vs Resampling in AI Control

James Lucassen, Adam Kaufman

v2
1440
24
58.3%
May 25, 2026
117

A Policy-Driven Runtime Layer for Agentic LLM Serving

Rui Zhang, Chaeeun Kim +1

1440
21
57.1%
May 26, 2026
118

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Shijie Cao, Yuan Yuan +1

1439
17
70.6%
May 28, 2026
119

Plan Before Search: Search Agents Need Plan

Zhipeng Qian, Zihan Liang +6

1439
25
60%
May 27, 2026
120

Measuring Progress Toward AGI: A Cognitive Framework

Ryan Burnell, Yumeya Yamamori +6

1438
17
52.9%
May 27, 2026
121

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

Shangding Gu

1438
21
61.9%
May 25, 2026
122

Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures

Junyoung Park, Sunghwan Park +2

1438
16
68.8%
May 28, 2026
123

OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories

Yibing Liu, Yangze Liu +5

1437
20
65%
May 28, 2026
124

MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains

Ashutosh Ojha, Vinay Aggarwal +4

1437
13
61.5%
May 28, 2026
125

ParaTool: Shifting Tool Representations from Context to Parameters

Zekai Yu, Qi Meng +4

1435
14
64.3%
May 28, 2026
126

Multi-Adapter Representation Interventions via Energy Calibration

Manjiang Yu, Hongji Li +5

1435
21
57.1%
May 27, 2026
127

Accelerating Constrained Decoding with Token Space Compression

Michael Sullivan, Alexander Koller

1434
13
76.9%
May 28, 2026
128

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

Bowen Wei, Nan Wang +3

1434
19
63.2%
May 27, 2026
129

Automatic Layer Selection for Hallucination Detection

Xinpeng Wang, William Cao +2

1433
26
65.4%
May 25, 2026
130

Can Broad Biomedical Knowledge be Contextualized into Scenario-Grounded Propositions?

Qingyuan Zeng, Ziyang Chen +6

1433
27
63%
May 26, 2026
131

MemFail: Stress-Testing Failure Modes of LLM Memory Systems

Ishir Garg, Neel Kolhe +2

1432
21
52.4%
May 26, 2026
132

Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns

Guni Sharon

1432
19
52.6%
May 27, 2026
133

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Yuxin Chen, Yi Zhang +6

1431
23
60.9%
May 26, 2026
134

Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning

Yi Wang, Haojie Lu +3

1431
20
65%
May 27, 2026
135

Back to Parsimonious Latents: Learning Task-Centric World Models from Visual Foundations

Minghao Fu, Fan Feng +2

1430
19
52.6%
May 25, 2026
136

Towards Faithful Agentic XAI: A Verification Method and an Open-World Benchmark for Better Model Faithfulness

Jaechang Kim, Sunung Mun +3

1429
23
56.5%
May 27, 2026
137

The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

Dueun Kim, Albert No

1428
15
60%
May 27, 2026
138

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

Kevin Wang, Anna Thöni +6

1428
18
61.1%
May 28, 2026
139

Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR

Soeun Kim, Albert No

1427
22
59.1%
May 27, 2026
140

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Yunbo Tang, Chengyi Yang +5

1427
17
70.6%
May 28, 2026
141

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

Tenghao Huang, Kung-Hsiang Huang +5

1427
14
71.4%
May 28, 2026
142

EgoBench: An Interactive Egocentric Multimodal Benchmark for Tool-Using Agents

Yunqi Liu, Tong Niu +5

1426
21
47.6%
May 27, 2026
143

Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning

Zhenyu Cui, Xiangzhong Luo

1426
22
50%
May 27, 2026
144

MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning

Aritra Dutta, Somak Aditya

1425
21
52.4%
May 25, 2026
145

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability

Leizhen Zhang, Shuhan Chen +1

1424
19
47.4%
May 27, 2026
146

Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents

Yunhui Gan, Tan Pan +6

1423
19
52.6%
May 26, 2026
147

ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure

A. J. Lew, Y. Cao +1

1423
19
63.2%
May 28, 2026
148

On the Geometry of Games and their Solvers

Yaqi Sun, Julian Ma +1

1422
15
53.3%
May 28, 2026
149

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

Jiahao Huang, Fei Cheng +3

1422
21
61.9%
May 28, 2026
150

Revealing Algorithmic Deductive Circuits for Logical Reasoning

Phuong Minh Nguyen, Tien Huu Dang +1

1422
24
54.2%
May 27, 2026
151

Uncertainty Reasoning with Large Language Models for Explainable Disease Diagnosis

Xiaoyang Fan, Yufan Cai +2

1420
20
40%
May 25, 2026
152

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Yansong Ning, Mianpeng Liu +3

1420
24
62.5%
May 27, 2026
153

OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields

Wanhao Liu, Jiaqing Xie +6

1419
13
53.8%
May 28, 2026
154

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Chen He, Yuhao Wu +3

1419
16
56.2%
May 28, 2026
155

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Tomer Keren, Nitay Calderon +4

1419
20
60%
May 27, 2026
156

Harnessing non-adversarial robustness in large language models

Qinghua Zhou, Ellina Aleshina +6

1418
19
63.2%
May 28, 2026
157

MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

Yuhao Shen, Lang Cao +5

1418
26
57.7%
May 26, 2026
158

Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG

Pin Qian, Su Wang +6

1418
23
56.5%
May 27, 2026
159

Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation

Heng Qu, Yike Liu +5

1416
21
57.1%
May 26, 2026
160

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

Hankyeol Kim, Pilsung Kang

1415
16
62.5%
May 26, 2026
161

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

Matteo Gioele Collu, Riccardo Conte +5

1414
18
38.9%
May 27, 2026
162

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Haoming Xu, Weihong Xu +6

1414
15
60%
May 28, 2026
163

Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

Wei Zheng, Yang Yan +6

1413
16
50%
May 28, 2026
164

Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering

Mateusz Czyżnikiewicz, Ryszard Tuora +6

1411
16
56.2%
May 26, 2026
165

DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution

Yunhai Hu, Zining Liu +6

1411
24
62.5%
May 27, 2026
166

Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability

Pedro Orvalho, Marta Kwiatkowska +2

1411
14
71.4%
May 28, 2026
167

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

Taojie Zhu, Wentao Zhao +6

1409
22
54.5%
May 27, 2026
168

Bandwidth-Efficient and Privacy-Preserving Edge-Cloud Many-to-Many Speech Translation

Yexing Du, Kaiyuan Liu +5

1409
21
52.4%
May 27, 2026
169

Teaching Values to Machines: Simulating Human-Like Behavior in LLMs

Asaf Yehudai, Naama Rozen +1

1408
13
61.5%
May 28, 2026
170

LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs

Jung Hyun Lee, June Yong Yang +2

1408
18
61.1%
May 28, 2026
171

RAISE: RAG Design as an Architecture Search Problem

Zhen Chen, Yibing Liu +4

1407
14
57.1%
May 28, 2026
172

VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora

Yuting Xu, Jiayi Tian +5

1407
19
52.6%
May 27, 2026
173

You Live More Than Once: Towards Hierarchical Skill Meta-Evolving

Xujun Li, Kehan Zheng +6

1406
19
47.4%
May 27, 2026
174

Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent Collaboration Network

Qiming Ye, Peixain Zhang +3

v2
1406
19
52.6%
May 25, 2026
175

Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments

Yuxin Chen, Xiaodong Cai +6

1405
17
58.8%
May 26, 2026
176

Towards end-to-end LLM-based censoring-aware survival analysis

Yishu Wei, Hexin Dong +4

1404
21
57.1%
May 25, 2026
177

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

Adam Bawatneh, Sagar Sapkota +3

1403
26
42.3%
May 25, 2026
178

D2D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Aoxi Liu, Yupeng Chen +6

1402
21
61.9%
May 25, 2026
179

Generating Robust Portfolios of Optimization Models using Large Language Models

Eleni Straitouri, Cheol Woo Kim +1

1402
24
50%
May 26, 2026
180

Whose Alignment? Comparing LLM Process Alignment Across Diverse Organizational Decision Contexts

Niklas Weller, Emilio Barkett

1400
20
40%
May 24, 2026
181

From XXLTraffic to EvoXXLTraffic: Scaling Traffic Forecasting to Sensor-Evolving Networks

Du Yin, Hao Xue +3

1400
16
56.2%
May 28, 2026
182

VitalAgent: A Tool-Augmented Agent for Reactive and Proactive Physiological Monitoring over Wearable Health Data

Di Zhu, Yu Yvonne Wu +4

1400
12
66.7%
May 28, 2026
183

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Yusong Lin, Xinyuan Liang +6

1400
19
52.6%
May 25, 2026
184

A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test

Camilo Chacón Sartori, José H. García

1400
22
50%
May 27, 2026
185

BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models

Fei Deng, Yanwu Xu +5

1399
19
52.6%
May 27, 2026
186

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

Srini Ramaswamy

1398
21
57.1%
May 26, 2026
187

Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation

Soumyadeep Jana, Sagar Nishad +1

1397
22
50%
May 28, 2026
188

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

Qi Liu, Mingdi Sun +6

1397
16
50%
May 28, 2026
189

AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions

Jingwei Sun, Jianing Zhu +4

1397
20
55%
May 25, 2026
190

Natural Language Query to Configuration for Retrieval Agents

Melissa Z. Pan, Negar Arabzadeh +4

1397
25
52%
May 26, 2026
191

AlphaTransit: Learning to Design City-scale Transit Routes

Bibek Poudel, Sai Swaminathan +1

1396
25
56%
May 27, 2026
192

CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models

Fengze Yang, Bo Yu +3

1396
21
47.6%
May 27, 2026
193

StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning

Yanfei Zhang, Xu Lin +1

1396
26
53.8%
May 26, 2026
194

Xetrieval: Mechanistically Explaining Dense Retrieval

Zhixin Cai, Jun Bai +6

1395
18
55.6%
May 28, 2026
195

Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence

Yanan Wang, Shuaicong Hu +4

1394
14
57.1%
May 28, 2026
196

SkillGrad: Optimizing Agent Skills Like Gradient Descent

Hanyu Wang, Yifan Lan +3

1393
20
50%
May 26, 2026
197

ProvMind: Provenance-grounded reasoning for materials synthesis

Yiming Zhang, Ryo Tamura +1

1393
21
52.4%
May 27, 2026
198

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Xiaona Zhou, Muntasir Wahed +3

1393
18
50%
May 28, 2026
199

BatteryMFormer: Multi-level Learning for Battery Degradation Trajectory Forecasting

Ruifeng Tan, Jintao Dong +4

1392
24
50%
May 26, 2026
200

VeriTrace: Evolving Mental Models for Deep Research Agents

Haolang Zhao, Yunbo Long +2

1392
22
50%
May 25, 2026
Win-rate scores from pairwise comparisons with 95% confidence intervals. Papers compared using full-text deep analysis.