Artificial Intelligence Paper Rankings

AI-estimated scientific impact ranking of the latest arXiv Artificial Intelligence preprints. Methodology New: General Relativity

Sign up for free to unlock all papers &

200papers (327 total)
51350matches
1

Towards a General Intelligence and Interface for Wearable Health Data

Girish Narayanswamy, Maxwell A. Xu +6

1643
43
79.1%
May 21, 2026
2

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

Zhiqin Yang, Yonggang Zhang +4

1589
34
61.8%
May 20, 2026
3

KISS - Knowledge Infrastructure for Scientific Simulation: A Scaffolding for Agentic Earth Science

Ziwei Li, Liujun Zhu +6

1582
22
90.9%
May 18, 2026
4

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search

Sarah Martinson, Michael P. Brenner +4

1577
18
83.3%
May 15, 2026
5

The Capability Paradox: How Smarter Auditors Make Multi-Agent Systems Less Secure

Qiqi Liu, Thorsten Holz +2

1563
21
85.7%
May 17, 2026
6

Advancing Mathematics Research with AI-Driven Formal Proof Search

George Tsoukalas, Anton Kovsharov +6

1563
21
90.5%
May 21, 2026
7

Hallucination as Exploit: Evidence-Carrying Multimodal Agents

Guijia Zhang, Hao Zheng +1

1558
20
65%
May 18, 2026
8

Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery

Xinzhe Yuan, Zhuo Chen +5

1553
20
85%
May 18, 2026
9

TRACE: Trajectory Correction from Cross-layer Evidence for Hallucination Reduction

Tej Sanibh Ranade

1546
37
70.3%
May 18, 2026
10

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

Ahmad Al-Tawaha, Shangding Gu +3

1541
25
80%
May 18, 2026
11

What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code

Yuze Zhao, Junpeng Fang +6

1539
22
90.9%
May 19, 2026
12

From Prompts to Protocols: An AI Agent for Laboratory Automation

Angelos Angelopoulos, James F. Cahoon +1

1538
24
91.7%
May 15, 2026
13

SciCore-Mol: Augmenting Large Language Models with Pluggable Molecular Cognition Modules

Yuxuan Chen, Changwei Lv +6

1535
22
81.8%
May 21, 2026
14

Reasoning Can Be Restored by Correcting a Few Decision Tokens

Changshuo Shen, Leheng Sheng +3

1535
20
85%
May 16, 2026
15

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

Mingkai Deng, Jinyu Hou +5

1534
22
95.5%
May 21, 2026
16

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

Xavier Theimer-Lienhard, Mushtaha El-Amin +6

1529
16
81.2%
May 15, 2026
17

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

Jiayu Li, Enpei Zhang +3

1528
22
86.4%
May 18, 2026
18

State Contamination in Memory-Augmented LLM Agents

Yian Wang, Agam Goyal +2

1525
26
84.6%
May 16, 2026
19

Imperfect World Models are Exploitable

Logan Mondal Bhamidipaty, Esmeralda S. Whitammer +3

v2
1524
22
77.3%
May 15, 2026
20

Generative Recursive Reasoning

Junyeob Baek, Mingyu Jo +4

v2
1523
19
68.4%
May 19, 2026
21

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play

Roger Creus Castanyer, Geoffrey Bradway +4

1521
20
80%
May 16, 2026
22

ECG-WM: A Physiology-Informed ECG World Model for Clinical Intervention Simulation

Zhikang Chen, Yue Wang +5

1519
20
80%
May 17, 2026
23

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Ali Hatamizadeh, Yejin Choi +1

1518
18
88.9%
May 21, 2026
24

NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning

Haoran Lu, Luyang Fang +2

1518
20
80%
May 16, 2026
25

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

Jiahe Guo, Xiangran Guo +6

1518
25
84%
May 18, 2026
26

Echo: Learning from Experience Data via User-Driven Refinement

Hande Dong, Xiaoyun Liang +6

1513
16
75%
May 21, 2026
27

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

Junyao Yang, Chen Qian +5

1511
18
77.8%
May 18, 2026
28

How Far Are We From True Auto-Research?

Zhengxin Zhang, Ning Wang +2

1511
27
55.6%
May 18, 2026
29

Beyond Mode Collapse: Distribution Matching for Diverse Reasoning

Xiaozhe Li, Yang Li +6

1510
14
85.7%
May 19, 2026
30

Forecasting Scientific Progress with Artificial Intelligence

Sean Wu, Pan Lu +6

1510
30
36.7%
May 21, 2026
31

From Static Risk to Dynamic Trajectories: Toward World-Model-Inspired Clinical Prediction

Pujun Feng, Xiaoyu Guo +6

1508
23
78.3%
May 16, 2026
32

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

Sixiong Xie, Zhuofan Shi +6

1507
33
36.4%
May 20, 2026
33

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Parand A. Alamdari, Toryn Q. Klassen +1

1504
20
80%
May 15, 2026
34

Not all uncertainty is alike: volatility, stochasticity, and exploration

Payam Piray

1503
21
81%
May 19, 2026
35

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

Yufei Ma, Zihan Liang +6

1502
24
75%
May 18, 2026
36

PAIR: Prefix-Aware Internal Reward Model for Multi-Turn Agent Optimization

Wonjoong Kim, Yeonjun In +3

1500
20
80%
May 18, 2026
37

What Does the AI Doctor Value? Auditing Pluralism in the Clinical Ethics of Language Models

Payal Chandak, Victoria Alkin +6

1499
24
83.3%
May 18, 2026
38

Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most

Nick Merrill, Jaeho Lee +1

1498
23
82.6%
May 21, 2026
39

PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models

Ziliang Zhao, Zenan Xu +6

1498
14
71.4%
May 20, 2026
40

AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows

Shuaike Shen, Wenduo Cheng +3

1492
30
70%
May 19, 2026
41

Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains

Rebecca Ramnauth, Drazen Brscic +1

1492
18
72.2%
May 19, 2026
42

Latent-space Attacks for Refusal Evasion in Language Models

Giorgio Piras, Raffaele Mura +5

1492
17
82.4%
May 20, 2026
43

SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?

Kevin Han, Renfei Zhang +4

1490
24
83.3%
May 20, 2026
44

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

Qianshu Cai, Yonggang Zhang +5

1487
18
66.7%
May 21, 2026
45

Look Before You Leap: Autonomous Exploration for LLM Agents

Ziang Ye, Wentao Shi +6

1484
16
68.8%
May 15, 2026
46

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

Jinbiao Wei, Qianran Ma +5

1482
21
76.2%
May 19, 2026
47

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

Zhuohan Gu, Qizheng Zhang +2

1482
18
72.2%
May 19, 2026
48

TTE-Flash: Accelerating Reasoning-based Multimodal Representations via Think-Then-Embed Tokens

Jianpeng Cheng, Xian Wu +6

1481
22
68.2%
May 15, 2026
49

Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management

Carol Xuan Long, David Simchi-Levi +4

1481
23
73.9%
May 16, 2026
50

ADR: An Agentic Detection System for Enterprise Agentic AI Security

Chenning Li, Pan Hu +6

1479
23
69.6%
May 17, 2026
51

PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control

Jingxuan Wei, Xi Bai +6

1478
19
78.9%
May 15, 2026
52

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

Zhenlin Wei, Pu Jian +6

1477
22
68.2%
May 18, 2026
53

GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards

Kyeongjin Ahn, Seungeon Lee +2

1476
23
69.6%
May 19, 2026
54

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Jiaqi Liu, Shi Qiu +6

1475
27
74.1%
May 19, 2026
55

Open-World Evaluations for Measuring Frontier AI Capabilities

Sayash Kapoor, Peter Kirgis +6

1475
20
65%
May 19, 2026
56

Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination

Jinrui Jiang, Zhangtai Wu +2

1475
24
66.7%
May 19, 2026
57

Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

Tianshi Xu, Huifeng Wen +1

1475
16
68.8%
May 21, 2026
58

GIM: Evaluating models via tasks that integrate multiple cognitive domains

Rohit Patel, Alexandre Rezende +1

1474
24
62.5%
May 18, 2026
59

Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving

Oussama Zenkri, Oliver Brock

1471
22
72.7%
May 19, 2026
60

Memory-Guided Tree Search with Cross-Branch Knowledge Transfer for LLM Solver Synthesis

Fatemeh Haji, Javier Delarosa Quiros +1

1468
19
68.4%
May 17, 2026
61

Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards

Xuehui Yu, Fucheng Cai +3

1468
27
74.1%
May 20, 2026
62

Latent Action Reparameterization for Efficient Agent Inference

Wenhao Huang, Qingwen Zeng +6

v2
1466
18
61.1%
May 18, 2026
63

Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability

Joël Roman Ky, Salah Ghamizi +1

1466
16
75%
May 21, 2026
64

PRISMat: Policy-Driven, Permutation-Invariant Autoregressive Material Generation

Claire Schlesinger, Circe Hsu +2

1465
22
63.6%
May 15, 2026
65

Property-Guided LLM Program Synthesis for Planning

André G. Pereira, Augusto B. Corrêa +1

v2
1464
16
62.5%
May 15, 2026
66

Harnessing LLM Agents with Skill Programs

Hongjun Liu, Yifei Ming +2

1463
17
58.8%
May 18, 2026
67

Generative AI and the Productivity Divide: Human-AI Complementarities in Education

Lihi Idan, Bharat Anand

1463
19
68.4%
May 18, 2026
68

Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

Xing Zhang, Yanwei Cui +5

1462
14
50%
May 19, 2026
69

Interactive Evaluation Requires a Design Science

Keyang Xuan, Peiyang Song +6

1460
16
62.5%
May 18, 2026
70

Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

Xing Zhang, Yanwei Cui +5

1456
14
78.6%
May 21, 2026
71

Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models

SeungWon Seo, DongHeun Han +2

1456
22
63.6%
May 16, 2026
72

ExComm: Exploration-Stage Communication for Error-Resilient Agentic Test-Time Scaling

Woomin Song, Beomjun Kim +5

1455
20
65%
May 21, 2026
73

Learning to Learn from Multimodal Experience

Xingyu Sui, Weixiang Zhao +5

1455
21
66.7%
May 16, 2026
74

TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents

Zhiqiang Liu, Wenhui Dong +4

1454
20
75%
May 16, 2026
75

Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

Simon Dennis, Rivaan Patil +2

1453
20
70%
May 21, 2026
76

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Utkarsh Tyagi, Xingang Guo +6

1452
17
58.8%
May 19, 2026
77

Episodic-Semantic Memory Architecture for Long-Horizon Scientific Agents

Nikola Milosevic

1452
16
50%
May 17, 2026
78

MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis

Haiyang Shen, Taian Guo +6

1450
12
75%
May 20, 2026
79

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Akshay Manglik, Apaar Shanker +6

v2
1450
21
66.7%
May 20, 2026
80

Mind the Sim-to-Real Gap & Think Like a Scientist

Harsh Parikh, Gabriel Levin-Konigsberg +2

1450
24
62.5%
May 20, 2026
81

Learning Quantifiable Visual Explanations Without Ground-Truth

Amritpal Singh, Andrey Barsky +3

1449
14
50%
May 18, 2026
82

CyberCorrect: A Cybernetic Framework for Closed-Loop Self-Correction in Large Language Models

Yuning Wu, Yingmin Liu +1

1449
20
75%
May 17, 2026
83

Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches

Tinghan Ye, Arnaud Deza +3

1448
18
55.6%
May 18, 2026
84

How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study

Shuqi Zhu, Yi Zhong +5

1448
19
63.2%
May 16, 2026
85

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Zihan Liang, Yufei Ma +5

1447
16
75%
May 21, 2026
86

Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management

Guanyu Cui, Zhewei Wei +1

1447
22
54.5%
May 19, 2026
87

NGM: A Plug-and-Play Training-Free Memory Module for LLMs

Yuwen Qu, Wenhui Dong +2

1446
20
65%
May 16, 2026
88

EXG: Self-Evolving Agents with Experience Graphs

Yuxin Jin, Siyuan Zhang +4

1445
17
58.8%
May 18, 2026
89

ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding

Mingyang Rao, Kehua Feng +5

1445
26
53.8%
May 17, 2026
90

POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents

Qiaoyuan Zheng, Yiqu Yang +2

1444
25
60%
May 18, 2026
91

Active Evidence-Seeking and Diagnostic Reasoning in Large Language Models for Clinical Decision Support

Chen Zhan, Xihe Qiu +6

1443
20
70%
May 21, 2026
92

Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning

Dillon Z. Chen, Till Hofmann +2

1442
22
68.2%
May 15, 2026
93

Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

Andrii Kryshtal

1442
18
77.8%
May 21, 2026
94

AMEL: Accumulated Message Effects on LLM Judgments

Sid-ali Temkit

1441
19
73.7%
May 21, 2026
95

CLORE: Content-Level Optimization for Reasoning Efficiency

Yuyang Wu, Qiyao Xue +5

1441
15
60%
May 21, 2026
96

Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation

Nicanor Mayumu, Xiaoheng Deng +1

1440
20
55%
May 17, 2026
97

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents

Xiaozhe Li, Tianyi Lyu +6

1439
17
64.7%
May 19, 2026
98

A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation

Qingchuan Ma, Yuexiao Ma +4

1439
20
60%
May 17, 2026
99

CatalyticMLLM: A Graph-Text Multimodal Large Language Model for Catalytic Materials

Yanjie Li

1438
20
55%
May 17, 2026
100

Beyond Rational Illusion: Behaviorally Realistic Strategic Classification

Xinpeng Lv, Yunxin Mao +6

1438
22
50%
May 19, 2026
101

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

Kuei-Chun Kao, Daixuan Huo +2

1437
20
60%
May 17, 2026
102

Reasoning Before Diagnosis: Physician-Inspired Structured Thinking for ECG Classification

Yang Wu, Xiaoyan Yuan +2

1436
18
61.1%
May 17, 2026
103

Actionable World Representation

Kunqi Xu, Jitao Li +5

1436
17
58.8%
May 18, 2026
104

LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning

Shradha Agarwal, Deepak Rajbhar +1

1434
19
57.9%
May 15, 2026
105

Probabilistic Tiny Recursive Model

Amin Sghaier, Ali Parviz +1

1432
21
61.9%
May 19, 2026
106

Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support

Gary Simethy, Daniel Ortiz Arroyo +1

1432
21
57.1%
May 19, 2026
107

SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Puyi Wang, Yuhao Wang +5

1432
24
58.3%
May 19, 2026
108

Responsible Agentic AI Requires Explicit Provenance

Jinwei Hu, Xinmiao Huang +4

1431
22
59.1%
May 16, 2026
109

Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs

Junyu Pan, Yansen Wang +4

1431
18
61.1%
May 18, 2026
110

Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

Zhiyuan Jerry Lin, Benjamin Letham +3

1430
22
50%
May 18, 2026
111

Implicit Safety Alignment from Crowd Preferences

Qian Lin, Daniel S. Brown

1430
14
57.1%
May 20, 2026
112

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

Can Hankendi, Rana Shahout +2

1428
21
52.4%
May 20, 2026
113

Latent Heuristic Search: Continuous Optimization for Automated Algorithm Design

Cheikh Ahmed, Mahdi Mostajabdaveh +1

1428
19
57.9%
May 16, 2026
114

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Caixin Kang, Tianyu Yan +6

1427
15
60%
May 21, 2026
115

Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

Junpeng Zhang, Lei Cheng +4

1426
23
60.9%
May 18, 2026
116

What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct

Meryl Ye, Lujain Ibrahim +6

1425
18
50%
May 20, 2026
117

DocOS: Towards Proactive Document-Guided Actions in GUI Agents

Jingjing Liu, Ziye Huang +6

1424
22
50%
May 18, 2026
118

Skill Weaving: Efficient LLM Improvement via Modular Skillpacks

Zhuo Li, Guodong Du +6

1423
20
55%
May 21, 2026
119

Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling

Longgang He, Longzhu He +2

1422
25
52%
May 19, 2026
120

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

Sadia Asif, Mohammad Mohammadi Amiri +3

1422
14
57.1%
May 21, 2026
121

ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM Reasoning

Yeqiu Chen, Ziyan Liu +4

1421
18
66.7%
May 21, 2026
122

Deterministic Event-Graph Substrates as World Models for Counterfactual Reasoning

Fabio Rovai

1421
25
52%
May 15, 2026
123

Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs

Pierre Boudart, Pierre Gaillard +1

1419
17
58.8%
May 19, 2026
124

FLUID: From Ephemeral IDs to Multimodal Semantic Codes for Industrial-Scale Livestreaming Recommendation

Xinhang Yuan, Zexi Huang +6

1419
17
47.1%
May 20, 2026
125

Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework

Mengyu Sun, Ziyuan Yang +4

1418
17
47.1%
May 18, 2026
126

IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents

Daewon Choi, Kyunghyun Park +5

1418
16
62.5%
May 21, 2026
127

Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models

Weicong Ni, Tianbao Jiang +1

1418
18
61.1%
May 19, 2026
128

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

Yuxuan Gao, Megan Wang +3

1417
20
45%
May 18, 2026
129

Evidence-Grounded Frontier Mapping and Agentic Hypothesis Generation in Nanomedicine

Christiaan G. A. Viviers, Koen de Bruin +6

1416
28
64.3%
May 18, 2026
130

\ECUAS{n}: A family of metrics for principled evaluation of uncertainty-augmented systems

Lautaro Estienne, Erik Ernst +3

1415
23
60.9%
May 19, 2026
131

Dynamics of collective creativity in AI art competitions

Mason Youngblood, Jeff Nusz +1

1415
18
55.6%
May 16, 2026
132

ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents

Chinmay Savadikar, Mingyu Zhao +6

1415
26
46.2%
May 15, 2026
133

Prediction of Challenging Behaviors Associated with Profound Autism in a Classroom Setting Using Wearable Sensors

Yadhu Kartha, Conor Anderson +6

1414
20
60%
May 17, 2026
134

Cross-domain benchmarks reveal when coordinated AI agents improve scientific inference from partial evidence

Fiona Y. Wong, Markus J. Buehler

1414
18
44.4%
May 21, 2026
135

AnchorDiff: Topology-Aware Masked Diffusion with Confidence-based Rewriting for Radiology Report Generation

Shiying Yu, Jielei Wang +1

1413
18
44.4%
May 16, 2026
136

Neurosymbolic Learning for Inference-Time Argumentation

Gabriel Freedman, Adam Dejl +5

1413
27
59.3%
May 19, 2026
137

Harnessing AI for Inverse Partial Differential Equation Problems: Past, Present, and Prospects

Zhentao Tan, Yuze Hao +4

1412
19
52.6%
May 16, 2026
138

GraphMind: From Operational Traces to Self-Evolving Workflow Automation

Yiwen Zhu, Joyce Cahoon +6

1410
19
57.9%
May 17, 2026
139

AI for Auto-Research: Roadmap & User Guide

Lingdong Kong, Xian Sun +6

1409
17
47.1%
May 18, 2026
140

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

Igor Bogdanov, Chung-Horng Lung +4

1409
26
46.2%
May 15, 2026
141

Scalable Environments Drive Generalizable Agents

Jiayi Zhang, Fanqi Kong +6

1408
19
52.6%
May 18, 2026
142

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

Nithin Somasekharan, Youssef Hassan +6

1407
21
52.4%
May 18, 2026
143

QQJ: Quantifying Qualitative Judgment for Scalable and Human-Aligned Evaluation of Generative AI

Marjan Veysi, Pirooz Shamsinejadbabaki +2

1407
23
56.5%
May 17, 2026
144

Skim: Speculative Execution for Fast and Efficient Web Agents

Mike Wong, Kevin Hsieh +2

1407
18
61.1%
May 15, 2026
145

Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression

Wei Luo, Yi Huang +4

1405
17
58.8%
May 21, 2026
146

DARE-EEG: A Foundation Model for Mining Dual-Aligned Representation of EEG

Yang Shao, Peiliang Gong +2

1405
22
63.6%
May 18, 2026
147

OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

Chiara Maria Russo, Simone Carnemolla +4

1405
20
55%
May 18, 2026
148

Causal Intervention-Based Memory Selection for Long-Horizon LLM Agents

Saksham Sahai Srivastava

1405
18
55.6%
May 17, 2026
149

SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

Yifan Zhou, Zhentao Zhang +6

1405
18
55.6%
May 18, 2026
150

Self-supervised Hierarchical Visual Reasoning with World Model

Yuanfei Xu, Lin Liu +3

1404
23
43.5%
May 17, 2026
151

Trace2Skill: Verifier-Guided Skill Evolution for Long-Context EDA Agents

Zijian Du, Nathaniel Pinckney

1404
16
50%
May 20, 2026
152

Unlocking Proactivity in Task-Oriented Dialogue

Hongbin Zhang, Ning Gao +6

1404
18
55.6%
May 21, 2026
153

Memory-Augmented Reinforcement Learning Agent for CAD Generation

Yin Xiaolong, Liu Yu +5

1403
23
39.1%
May 19, 2026
154

ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs

Bingjun Luo, Tony Wang +2

1403
14
50%
May 21, 2026
155

The Impact of AI Usage and Informativeness on Skill Development in Logical Reasoning

Shang Wu, Hongyu Yao +4

1403
19
73.7%
May 20, 2026
156

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

Md Mehrab Tanjim, Jayakumar Subramanian +6

1403
21
47.6%
May 19, 2026
157

Divergence-Suppressing Couplings for Rectified Flow

Yimeng Min, Carla P. Gomes

1402
17
52.9%
May 18, 2026
158

Interaction Locality in Hierarchical Recursive Reasoning

Yosuke Miyanishi, Tetsuro Morimura

1401
23
60.9%
May 20, 2026
159

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Banghao Chi, Yining Xie +6

1401
20
55%
May 21, 2026
160

ScreenSearch: Uncertainty-Aware OS Exploration

Michael Solodko, Justin Wagle

1400
27
44.4%
May 15, 2026
161

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

ZhiYuan Feng, Yu Deng +6

1400
18
38.9%
May 18, 2026
162

Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models

Zheng Lin, Zhenxing Niu +3

1399
19
52.6%
May 19, 2026
163

Interference-Aware Multi-Task Unlearning

Ying-Hua Huang, Rui Fang +2

1398
26
50%
May 18, 2026
164

AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters

Hanjun Luo, Zhimu Huang +6

1397
24
45.8%
May 21, 2026
165

Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use

Changkun Ou

1396
20
50%
May 18, 2026
166

PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning

Qiran Zhang, Yuheng Wang +6

1393
19
42.1%
May 19, 2026
167

AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions

Minghao Chen, Xinyi Hu +2

1393
26
53.8%
May 20, 2026
168

Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

Dmitry Redko, Albert Fazlyev +4

1393
18
55.6%
May 19, 2026
169

TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

Zhaoyang Chu, Jiarui Hu +6

1392
15
60%
May 21, 2026
170

Efficient Lookahead Encoding and Abstracted Width for Learning General Policies in Classical Planning

Michael Aichmüller, Simon Ståhlberg +2

1392
26
53.8%
May 18, 2026
171

CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean

Wentao Long, Yunfei Zhang +4

1391
23
56.5%
May 17, 2026
172

MetaCogAgent: A Metacognitive Multi-Agent LLM Framework with Self-Aware Task Delegation

Chenyu Wang, Yang Shu

1391
23
43.5%
May 17, 2026
173

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

Anis Radianis

1390
25
52%
May 18, 2026
174

The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems

Yohei Nakajima

1389
15
46.7%
May 21, 2026
175

A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents

Vasundra Srinivasan

1388
23
60.9%
May 19, 2026
176

STRIDE: A Self-Reflective Agent Framework for Reliable Automatic Equation Discovery

Jiarui Su, Songjun Tu +2

1388
21
61.9%
May 18, 2026
177

RAG-based EEG-to-Text Translation Using Deep Learning and LLMs

Enrico Collautti, Xiaopeng Mao +3

1388
18
44.4%
May 17, 2026
178

Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps

Tanmay Asthana, Aman Saksena +1

1387
24
45.8%
May 17, 2026
179

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

Han Li, Vibhor Malik +6

1387
18
66.7%
May 19, 2026
180

Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

S. Bensalem, Y. Dong +6

1385
25
48%
May 18, 2026
181

SGR-Bench: Benchmarking Search Agents on State-Gated Retrieval

Ningyuan Li, Haiyang Shen +5

1384
16
56.2%
May 21, 2026
182

Generative Auto-Bidding with Unified Modeling and Exploration

Mingming Zhang, Feiqing Zhuang +6

1384
27
55.6%
May 19, 2026
183

Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models

Ethan Tang

1383
23
56.5%
May 17, 2026
184

Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks

Yajing Zhou, Xiangyu Kong

1381
20
55%
May 18, 2026
185

Planning in the LLM Era: Building for Reliability and Efficiency

Michael Katz, Harsha Kokel +2

1379
15
46.7%
May 21, 2026
186

AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)

Virginia K. Hench, J. Harry Caufield +3

1378
15
60%
May 20, 2026
187

SAPO: Step-Aligned Policy Optimization for Reasoning-Based Generative Recommendation

Zaiyi Zheng, Guanghui Min +5

1375
21
52.4%
May 17, 2026
188

Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks

Haichao Miao, Zhimin Li +5

1373
16
56.2%
May 20, 2026
189

Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law

Parisa Kordjamshidi, Samer Aslan +3

1373
31
41.9%
May 15, 2026
190

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

Igor Bogdanov, Chung-Horng Lung +4

1371
36
44.4%
May 15, 2026
191

Can Large Language Models Revolutionize Survey Research? Experiments with Disaster Preparedness Responses

Yan Wang, Ziyi Guo +1

1370
24
45.8%
May 19, 2026
192

LACO: Adaptive Latent Communication for Collaborative Driving

Tianhao Chen, Yuheng Wu +1

1368
18
50%
May 21, 2026
193

CBT-Audio: Evaluating Audio Language Models for Patient-Side Distress Intensity Estimation in CBT Session Recordings

Qixuan Hu, Shuchang Ye +6

1368
19
47.4%
May 17, 2026
194

EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design

Gioele Molinari, Florian Felten +2

1368
22
54.5%
May 19, 2026
195

ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving

Qiyu Ruan, Yuxuan Wang +3

1367
23
43.5%
May 20, 2026
196

Scaling Observation-aware Planning in Uncertain Domains

Adrian Zvizdenco, Arthur Conrado Veiga Bosquetti +2

1367
20
50%
May 21, 2026
197

Evaluating the Utility of Personal Health Records in Personalized Health AI

Rory Sayres, Kejia Chen +6

1365
19
42.1%
May 18, 2026
198

Pairwise Preference Reward and Group-Based Diversity Enhancement for Superior Open-Ended Generation

Guining Cao, Jiaxin Peng +4

1362
17
47.1%
May 18, 2026
199

AgentAtlas: Beyond Outcome Leaderboards for LLM Agents

Parsa Mazaheri, Kasra Mazaheri

1362
21
42.9%
May 19, 2026
200

Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Tahreem Yasir, Wenbo Li +4

1361
28
39.3%
May 15, 2026
Win-rate scores from pairwise comparisons with 95% confidence intervals. Papers compared using full-text deep analysis.