Artificial Intelligence Paper Rankings
AI-estimated scientific impact ranking of the latest arXiv Artificial Intelligence preprints. Methodology
Sign up for free to unlock all papers &
AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design
Sahil Rahman, Maxx Richard Rahman
LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks
Po-Nien Kung, Linfeng Song +6
Safety Paradox: How Enhanced Safety Awareness Leaves LLMs Vulnerable to Posterior Attack
Long P. Hoang, Hai V. Le +3
Zero knowledge verification for frontier AI training is possible
Pierre Peigné, Ky Nguyen +1
Beyond One-shot: AI Agents for Learning in Field Experiments
Junjie Luo, Ritu Agarwal +1
Decomposing how prompting steers behavior
Fan L. Cheng, Nikolaus Kriegeskorte
LAP: An Agent-to-Instrument Protocol for Autonomous Science
Linwu Zhu, Liqiang Gao +3
Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models
Simone Caldarella, Davide Talon +3
Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments
Parth Asawa, Christopher M. Glaze +6
Closing the Loop on Latent Reasoning via Test-Time Reconstruction
Xiaopeng Yuan, Haibo Jin +5
Scaling Self-Evolving Agents via Parametric Memory
Tao Ren, Weiyao Luo +6
Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?
Jingheng Ye, Huiqi Zou +2
The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?
Xinyu Lu, Tianshu Wang +6
Towards World Models in Biomedical Research
Guangyu Wang, Jingkun Yue +6
Forget Attention: Importance-Aware Attention Is All You Need
Suhyeong Shin, Yeongwook Yang
v2Agents' Last Exam
Yiyou Sun, Xinyang Han +6
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
Guhong Chen, Yingcheng Shi +6
Gender-Dependent Diagnostic Substitution in LLM Medical Triage: Same Symptoms, Unequal Urgency
Qi Han Wong
What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents
Victor Ojewale, Suresh Venkatasubramanian
MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models
Zhichao Yang, Yuanze Hu +6
The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection
Wojciech Zarzecki, Jan Dubiński +1
Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems
Marquita Ellis, Paul Castro
Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents
Zhuoming Chen, Xinrui Zhong +6
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Wenbo Pan, Shujie Liu +6
SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification
Xiangyu Zhao, Hengyuan Zhao +6
Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories
Kyungmin Park, Taesup Kim
RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention
Yang Liu, ZhaoKai Luo +6
Can Generalist Agents Automate Data Curation?
Feiyang Kang, Hanze Li +6
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models
Mahtab Bigverdi, Lindsey Li +6
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning
Ziyan Liu, Xueda Shen +6
Reasoning Structure of Large Language Models
Frédéric Berdoz, Luca A. Lanzendörfer +2
Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems
Xizi Luo, Changhong He +3
EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management
Zherui Yang, Fan Liu +2
What Makes Interaction Trajectories Effective for Training Terminal Agents?
Sidi Yang, Chaofan Tao +6
AURA: Action-Gated Memory for Robot Policies at Constant VRAM
Josef Chen
MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents
Jia Yu, Zilong Wang +3
AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?
Zhangchen Xu, Junda Chen +6
Beyond Prompt-Based Planning: MCP-Native Graph Planning-based Biomedical Agent System
Zhangtianyi Chen, Florensia Widjaja +4
The Self-Correction Illusion: LLMs Correct Others but Not Themselves
Kuan-Yen Chen, Fang-Yi Su +1
AgentCL: Toward Rigorous Evaluation of Continual Learning in Language Agents
Yiheng Shu, Bernal Jiménez Gutiérrez +4
v2Beyond Similarity: Trustworthy Memory Search for Personal AI Agents
Jiawen Zhang, Kejia Chen +6
Benchmark Everything Everywhere All at Once
Shiyun Xiong, Dongming Wu +6
When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning
Chirag Parmar, Akshat Mehta +3
LLM Self-Recognition: Steering and Retrieving Activation Signatures
Thibaud Ardoin, Jonas Schäfer +1
Fix the Mind, Not the Move: Interpretable AI Assistance via Knowledge-Gap Localization
Ayano Hiranaka, Ya-Chuan Hsu +3
Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models
Rayyan Abdalla, Amir Hussein +2
DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration
Wenkai Wang, Tao Xiong +6
Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers
Gianluca Guidi, Francesca Dominici +6
Where does Absolute Position come from in decoder-only Transformers?
Valeria Ruscio, Umberto Nanni +1
CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection
Jinjie Shen, Yaxiong Wang +6
Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
Pengcheng Jiang, Zhiyi Shi +6
COMAP: Co-Evolving World Models and Agent Policies for LLM Agents
Youwei Liu, Jian Wang +2
SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment
Hao Li, Jingkun An +6
Iteris: Agentic Research Loops for Computational Mathematics
Leheng Chen, Zihao Liu +2
POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems
Iñaki Dellibarda Varela, R. Sendra-Arranz +6
Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs
Weiwei Ding, Zixuan Li +6
ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning
Bo-Hong Wang, Baicheng Peng +4
CP-Agent: Context-Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations
Yuxin Zhang, Yiyao Li +4
Step-by-Step Optimization-like Reasoning in LLMs over Expanding Search Spaces
Nicolás Astorga, Nabeel Seedat +1
SAGE: A Quantitative Evaluation of Socialized Evolution in Agent Ecosystems
Linyue Pan, Yaoming Zhu +3
Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making
Yuhan Yang, Ruipu Li +1
TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models
Ziwen Kan, Yishuo Chen +6
ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents
Yuxing Lu, Yushuhong Lin +5
AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning
Qingxu Fu, Boyin Liu +3
Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents
Yaoqi Chen, Haibin Lai +6
Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents
Shuo Ji, Yibo Li +1
Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition
Jinnuo Liu, Yue Peng +2
Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads
Yasmine Omri, Ziyu Gan +6
Amortizing Federated Adaptation: Hypernetwork Driven LoRA for Personalized Foundation Models
Sunny Gupta, Shambhavi Shanker +1
Inducing Reasoning Primitives from Agent Traces
Zhihan Lei, Jiarui Yan +2
MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery
Shangheng Du, Xiangchao Yan +6
DELTAMEM: Incremental Experience Memory for LLM Agents via Residual Trees
Haoran Tan, Zeyu Zhang +3
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents
Dongsheng Zhu, Xuchen Ma +6
Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection
Yaoxi Shi, Cathy Mengying Fang +2
Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement
Jui-Hui Chung, Ziyang Cai +6
Multilingual Fine-Tuning via Localized Gradient Conflict Resolution
Long P. Hoang, Yiran Zhao +2
LLM-Evolved Pattern Generators for Optimal Classical Planning
Windy Phung, Dominik Drexler +2
R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search
João Pedro Gandarela, Thiago Rios +2
Where Should Knowledge Enter? A Layered Framework for Knowledge Infusion in Multimodal Iterative Generative Mo
Renjith Prasad, Chathurangi Shyalika +2
AIP: A Graph Representation for Learning and Governing Agent Skills
Zachary Blumenfeld, Jim Webber
scTranslation: A Comprehensive Benchmark for Single-Cell Multi-Omics Modality Translation
Jiabei Cheng, Jingbo Zhou +5
A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR
Yuze Gao
When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents
Lingxiang Xu, Jiaoyun Yang +3
Not All Errors Are Equal: Consequence-Aware Reasoning Compute Allocation
Jingbo Wen, Liang He +1
SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training
Zhongyu He, Yuanfan Li +6
BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces
Liangwei Yang, Jielin Qiu +6
Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges
Srimonti Dutta, Akshata Kishore Moharir
Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection
Senjie Jin, Peixin Wang +6
Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio
Fangbo Tu, Junhua Zhao +5
FALSIFYBENCH: Evaluating Inductive Reasoning in LLMs with Rule Discovery Games
Leonardo Bertolazzi, Katya Tentori +1
TAPO: Tool-Aware Policy Optimization via Credit Transfer for Multimodal Search Agents
Chengqi Dong, Chuhuai Yue +6
Learning Admissible Heuristics via Cost Partitioning
Hugo Barral, Quentin Cappart +2
MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation
Deguo Xia, Zihan Li +6
Output Type Before Quality: A Standards-Derived XAI Admissibility Rubric for Autonomous-Driving Safety
Abhinaw Priyadershi, Mandar Pitale +2
ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents
Rahul Suresh Babu, Laxmipriya Ganesh Iyer
FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG
Zhe Yu, Wenpeng Xing +4
Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution
Can Gurkan, Forrest Stonedahl +1
Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation
Yohann Benchetrit, Marlène Careil +4
From Answers to States: Verifiable Process-Level Evaluation of Chemical Reasoning in Large Language Models
Hongyu Guo, Hao Li +3
The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs
Xu Wan, Speed Zhu +5
Trivium: Temporal Regret as a First-Class Objective for Causal-Memory Controllers
Edward Y. Chang
Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI
Amjad Ibrahim, Yong Li
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving
Jianxin Yan, Wangze Ni +6
Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation
Saroj Mishra
SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale
Tong Bai, Zhenglin Wan +5
Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems
Dianxing Shi, Junqi He +3
Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows
Yuhang Fu, Ruishan Fang +5
SentinelBench: A Benchmark for Long-Running Monitoring Agents
Matheus Kunzler Maldaner, Adam Fourney +6
A Framework for Measuring Appropriate Reliance on Set-Valued AI Advice
Ranjan Mishra, Jakob Schoeffer
Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing
Yuxiao Ye, Haoran He +5
Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks
Dipesh KC, Anjila Budathoki
From Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM Agents
Patrick Wilhelm, Odej Kao
Unveiling the Structure of Do-Calculus Reasoning via Derivation Graphs
Clément Yvernes, Emilie Devijver +2
AdaMEM: Test-Time Adaptive Memory for Language Agents
Yunxiang Zhang, Yiheng Li +2
CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model
Zeyang Yue, Chenfei Yan +6
Unsupervised Skill Discovery for Agentic Data Analysis
Zhisong Qiu, Kangqi Song +5
Bridging Auxiliary Constraints to Resolve Instruction Following in Large Reasoning Models
Zhengyi Zhao, Shubo Zhang +6
InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain
Tiancheng Han, Yong Li +3
Coordination Graphs for Constrained Multi-Agent Reinforcement Learning
Santiago Amaya-Corredor, Miguel Calvo-Fullana +1
ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models
Ruihui Hou, Siyi Zhu +5
Learning Visual Spatial Planning from Symbolic State via Modality-Gap-Aware Self-Distillation
Haocheng Luo, Jiahui Liu +6
SkillPyramid: A Hierarchical Skill Consolidation Framework for Self-Evolving Agents
Yuan Xiong, Ziqi Miao +6
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment
Kokil Jaidka, Saifuddin Ahmed
PLAN-S: Bridging Planning with Latent Style Dynamics for Autonomous Driving World Models
Xiaoyun Qiu, Jingtao He +5
Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection
Sihang Zeng, Matthew Thompson +2
EvoDrive: Pareto Evolution for Safety-Critical Autonomous Driving via Self-Improving LLM Agents
Tong Nie, Yuewen Mei +5
Knowledge Index of Noah's Ark
Sheng Jin, Minghao Liu +6
v2StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis
Prashanth Vijayaraghavan, Apoorva Nitsure +3
From Risk Classification to Action Plan Remediation: A Guardrail Feedback Driven Framework for LLM Agents
Yuhao Sun, Jiacheng Zhang +4
MOC: Multi-Order Communication in LLM-based Multi-Agent Systems
Yao Guan, Lin Wang +4
Individual Gain, Collective Loss: Metacognitive Adaptation in AI-Assisted Creativity
Anna Mikeda
Brick-Composer: Using MLLMs for Assembly with Diverse Bricks
Jiateng Liu, Bingxuan Li +6
Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models
Steffen Knoblauch, Hao Li +4
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
Chen Huang, Yuhao Wu +1
The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents
Manvendra Modgil
HLL: Can Agents Cross Humanity's Last Line of Verification?
Xinhao Song, Su Su +6
Harnessing Generalist Agents for Contextualized Time Series
Zihao Li, Kaifeng Jin +6
TSQAgent: Rating Time Series Data Quality via Dedicated Agentic Reasoning
Shunyu Wu, Dan Li +6
PSEBench: A Controllable and Verifiable Benchmark for Evaluating LLMs in Patient Safety Event Triage
Keqi Han, Ryan Young +6
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations
Taewon Yun, Hyeonseong Park +4
PerceptUI: LLM Agents as Human-Aligned Synthetic Users for UI/UX Evaluation
Nicolas Bougie, Xiaotong Ye +2
Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline
Zhikai Chen, Jialiang Gu +6
Solipsistic Superintelligence is Unlikely to be Cooperative
Rakshit S Trivedi, Natasha Jaques +3
DMF: A Deterministic Memory Framework for Conversational AI Agents
Matteo Stabile, Enrico Zimuel
When to Re-Plan: Subgoal Persistence in Hierarchical Latent Reasoning
Ayushi Chadha
Tracking the Behavioral Trajectories of Adapting Agents
Jonah Leshin, Manish Shah +1
Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration
Jiaju Chen, Yuxuan Lu +6
Seeing Time: Benchmarking Chronological Reasoning and Shortcut Biases in Vision-Language Models
Haoyu Zhou, Qing Qing +6
Repair Before Veto: Repair-Augmented Constraint Learning for Contextual Decisions
Yifan Wang
Tree-Based Formalization of Multi-Agent Complementarity in Human-AI Interactions
Andrea Ferrario
Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback
Giulia Pucci, Emily Hemendinger +4
SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models
Joel Sol, Homayoun Najjaran
DragOn: A Benchmark and Dataset for Drag-Based GUI Interactions
Nathan Bout, Maxime Langevin +1
ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents
Anjie Liu, Yan Song +4
Residual Modeling for High-Fidelity Learned Compression of Scientific Data
Liangji Zhu, Sanjay Ranka +1
BigFinanceBench: A Workflow-Grounded Benchmark for Financial-Research Agents
Alex Wang, Georg Meinhardt +5
Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval
Jiaxi Li, Ke Deng +6
DiG-Plan: Mitigating Early Commitment for Tool-Graph Planning via Diffusion Guidance
Yansi Li, Zhuosheng Zhang
Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation
Arslan Bisharat, Brian Ortiz +6
BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization
Saket Reddy, Ke Yang +1
Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection
Gautam Gare, John Galeotti +3
AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety
Yanjing Ren, Reza Ebrahimi +1
Hedge-Bench: Benchmarking Agents on Hard, Realistic Tasks Pertaining to Financial Reasoning
Eric Cho, Shawn Huang +2
AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification
Yan Wang, Xuguang Ai +6
StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems
Taiyu Zhu, Yifan Wu +3
The Digital Apprentice: A Framework for Human-Directed Agentic AI Development
Travis Weber, Rohit Taneja
EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts
Yiming Lu, Sihang Zeng +4
Integrating Mechanistic and Data-Driven Models for Neurological Disorders through Differentiable Programming
Shah Pallav Dhanendrakumar, Saikat Pal +1
From Long News to Accurate Forecast: Importance-Aware Fusion and PRM-Guided Reflection for Time Series Forecasting
Mingyang Liu, Qingcan Kang +6
Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation
Kaiqi Yang, Tai-Quan Peng +2
Bridging the Last Mile of Time Series Forecasting with LLM Agents
Yuhua Liao, Zetian Wang +2
An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)
Jincheng Yu, Haoyang Li +6
An Infectious Disease Spread Simulation Based on Large Language Model Decision Making
Yonchanok Khaokaew, Ruochen Kong +6
Characterizing initial human-AI proof formalization workflows
Katherine M. Collins, Simon Frieder +6
Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking
Bonan Shen, Youting Wang +2
Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering
Thomas Eiter, Nelson Higuera Ruiz +1
Agentic Molecular Recovery via Molecule-Aware Exploration
Suwan Yoon, Changhee Lee
Multi-ResNets for Subspace Preconditioning in Constrained Optimization
Merve Karakas, Christopher J. Williams +4
WorldFly: A World-Model-Based Vision-Language-Action Model for UAV Navigation
Shengtao Zheng, Kai Li +6
Answer Presence Drives RAG Rewriting Gains
Yuejie Li, Yueying Hua +6
Parthenon Law: A Self-Evolving Legal-Agent Framework
Hejia Geng, Leo Liu
A formal definition and meta-model for a machine theory of mind
Fabio Cuzzolin
Proof-Refactor: Refactoring Generated Formal Proofs into Modular Artifacts
Yiming Fu, Peixuan Liu +2
The DeepSpeak-Agentic Dataset
Sarah Barrington, Maty Bohacek +1
Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory
Nehal Afifi, Mehdi Khabou +6
Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification
Thanh Luong Tuan, Abhijit Sanyal
v2When AI Says It Feels
Shin-nosuke Ishikawa, Seiya Ikeda +1
Retry Policy Gradients in Continuous Action Spaces
Soichiro Nishimori, Paavo Parmas
BiNSGPS: Geometry Problem Solving via Bidirectional Neuro-Symbolic Interaction
Qi Wang, Peijie Wang +2
GTBench: A Curriculum-Grounded Benchmark for Evaluating LLMs as Mathematical Research Assistants in Graph Theory
Noujoud Nader, Ibrahem Aljabea +2
Beyond Vector Similarity: A Structural Analysis of Graph-Augmented Retrieval for Industrial Knowledge Graphs
Grama Chethan
SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents
Wenxuan Wang, Haoyu Sun +5
Uncertainty-Aware Clarification in LLM Agents with Information Gain
Mengyi Deng, Zhiwei Li +5
MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation
Wenhao Wang, Peizhi Niu +6
RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering
Yuyang Li, Zihe Yan +1
Evaluating Agentic Configuration Repair for Computer Networks
Rufat Asadli, Benjamin Hoffman +2
Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents
Zhijie Ding, Weinan Hong +6
RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases
Phillip Jiang
TokenMizer: Graph-Structured Session Memory for Long-Horizon LLM Context Management
Shweta Mishra
GITCO: Gated Inference-Time Context Optimization in TSFMs
Manya Pandey, Dhruv Kumar +2