Artificial Intelligence Paper Rankings
AI-estimated scientific impact ranking of the latest arXiv Artificial Intelligence preprints. Methodology New: General Relativity
Sign up for free to unlock all papers &
A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography
Ziqing Yu, Yuhui Tao +6
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Adly Templeton, Tom Conerly +6
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence
Rui Meng, Bhavana Dalvi Mishra +6
AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation
Shanghua Gao, Ada Fang +1
Reward Bias Substitution: Single-Axis Bias Mitigations Redirect Optimization Pressure
Max Lamparth, Daniel Fein +3
Why LLMs Fail at Causal Discovery and How Interventional Agents Escape
Amartya Roy, Sonali Parbhoo
Entropy Distribution as a Fingerprint for Hallucinations in Generative Models
Mattia J. Villani, Pranav Deshpande +3
Calibrating Conservatism for Scalable Oversight
William Overman, Mohsen Bayati
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
Dongyoon Hahm, Dylan Hadfield-Menell +1
Formalizing Mathematics at Scale
Ahmad Rammal, Niket Patel +6
When Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models
Dasol Choi, Alex Kwon
RULER: Representation-Level Verification of Machine Unlearning
Georgina Cosma, Axel Finke
DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes
Caijun Xu, Changyi Xiao +2
SIA: Self Improving AI with Harness & Weight Updates
Prannay Hebbar, Yogendra Manawat +5
Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs
Zhe Yu, Wenpeng Xing +5
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
Bowen Wang, Dunjie Lu +6
What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation
Xiang Wang, Wei Wei
Learning to Search and Searching to Learn for Generalization in Planning
Michael Aichmüller, Yannik Hesse +1
LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning
Elliot Gestrin, Jendrik Seipp
The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure
Yubo Li, Ramayya Krishnan +1
The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context
Zhe Yu, Wenpeng Xing +5
Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems
Jianing Zhu, Yeonju Ro +6
GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents
Johannes Moll, Jean-Philippe Corbeil +5
Human-like in-group bias in instruction-tuned language model agents
Messi H. J. Lee
Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory
Abdelghny Orogat, Essam Mansour
The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF
Zeli Su, Zhankai Xu +5
Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion
Yizhuo Lu, Changde Du +6
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence
MiniMax, : +6
Voluntary Collusion with Secret Tools in Competing LLM Agents
Xijie Zeng, Frank Rudzicz
PassNet: Scaling Large Language Models for Graph Compiler Pass Generation
Yiqun Liu, Yingsheng Wu +6
CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning
Linas Nasvytis, Simon Jerome Han +4
AIBuildAI-2: A Knowledge-Enhanced Agent for Automatically Building AI Models
Ruiyi Zhang, Peijia Qin +3
Credit Assignment with Resets in Language Model Reasoning
Ankur Samanta, Akshayaa Magesh +6
v2LACUNA: Safe Agents as Recursive Program Holes
Yaoyu Zhao, Yichen Xu +4
When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop
Yang Zhang, Xiukun Wei +1
Quantifying and Optimizing Simplicity via Polynomial Representations
Tianren Zhang, Xiangxin Li +3
Battery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter Estimation
Jiawei Chen, Xiaofan Gui +5
PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design
Manpreet Kaur, Xingying Zhang +1
UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems
Yiqun Chen, Wei Yang +6
Better Accuracies, Worse Reasoning: A Step-Level Audit of Medical Chain-of-Thought Distillation
Zhaoyang Jiang, Xuanqi Peng +6
Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training
Kohsei Matsutani, Gouki Minegishi +3
Plant, Persist, Trigger: Sleeper Attack on Large Language Model Agents
Yongxiang Li, Moxin Li +5
Robust and Efficient Guardrails with Latent Reasoning
Siddharth Sai, Xiaofei Wen +1
CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models
Abhilash Durgam, Nyle Siddiqui +4
OpenURMA: A Clean-Room Open Implementation of the Unified Bus Protocol
Bojie Li
MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models
Tianzhuo Yang, Zihan Shen +6
Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems
Aman Priyanshu, Supriti Vijay +1
LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation
Gabriele Cesa, Thomas Hehn +5
AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems
Michael Hardy, Anka Reuel +6
Demystifying Data Organization for Enhanced LLM Training
Yalun Dai, Yangyu Huang +6
Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents
Ziyan Liu, Zhezheng Hao +6
Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning
Zhe Yu, Wenpeng Xing +5
Orthogonal Concept Erasure for Diffusion Models
Yuhao Sun, Lingyun Yu +4
DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning
Yang He, Xiao Ding +6
Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning
Zhikai Pan, Chih-Ting Liao +6
Behavioural Analysis of Alignment Faking
Nathaniel Mitrani Hadida, Rhea Karty +2
Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor
Guoxin Ma, Yibing Liu +6
LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?
HuiMing Fan, Xiao Wang +6
CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists
Junlin Yang, Dylan Zhang +6
From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator
Xiaohua Wang, Jiakang Yuan +6
Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows
Yilun Yao, Xinyu Tan +6
Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information
Renjie Gu, Jiaxu Li +6
Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal
Kia-Jüng Yang, Dominik Meier +3
Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent
Yiming Liu, Bin Lu +6
ReasonOps: Operator Segmentation for LLM Reasoning Traces
Daniel Lee, Owen Queen +1
SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models
Chao Ding, Mouxiao Bian +6
A Unified Framework for the Evaluation of LLM Agentic Capabilities
Pengyu Zhu, Lijun Li +6
MemCog: From Memory-as-Tool to Memory-as-Cognition in Conversational Agents
Zihan Li, Xingyu Fan +2
Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs
Yue Cheng, Jiajun Zhang +4
StructBreak: Structural Cognitive Overload-Induced Safety Failures in MLLMs
Yang Luo, Xinran Liu +4
Provably Secure Agent Guardrail
Benlong Wu, Weiming Zhang +3
MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation
Xiaoyu Dong, Zhi Li +1
Advancing Creative Physical Intelligence in Large Multimodal Models
Cheng Qian, Hyeonjeong Ha +6
ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay
Zhexin Hu, Li Wang +5
On the Origin of Synthetic Information by Means of Steganographic Inheritance
Ching-Chun Chang, Isao Echizen
Continual Model Routing in Evolving Model Hubs
Jack Bell, Giacomo Carfì +2
PRO-CUA: Process-Reward Optimization for Computer Use Agents
Yifei He, Rui Yang +3
Position: AI Safety Requires Effective Controllability
Yige Li, Yunhao Feng +1
Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning
Chen Linze, Cai Yufan +2
MolLingo: Molecule-Native Representations for LLM-Powered Scientific Agents
Thao Nguyen, Heng Ji
Can LLMs Introspect? A Reality Check
Shashwat Singh, Tal Linzen +1
Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy
Xu Shen, Zhen Tan +5
Do Clinical Models Change Treatment Decisions?
Dongkyu Cho, Miao Zhang +1
Localizing Input Uncertainty Quantification for Large Language Models via Shapley Values
Seongjun Lee, Suwan Yoon +1
From Fact Overwriting to Knowledge Evolution: Causal Editing via On-Policy Self-Distillation
Shuaike Li, Kai Zhang +3
EvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular Dynamics
Zhichen Tang, Zhengzheng Dang +4
Croissant Tasks: A Metadata Format for Reproducible Machine Learning Evaluations
Omar Benjelloun, Leonardo Martins Bianco +6
Planning with the Views via Scene Self-Exploration
Kangrui Wang, Linjie Li +6
SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations
Qinpei Luo, Ruichun Ma +2
Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations
Simardeep Singh, Paras Chopra
From Accounting to Coordination: A Virtual Water-Aware Electricity-Computation-Water Nexus Framework for Data Center Dispatch
Haiyang You, Chengwei Lou +4
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research
Dingbang Wu, Rui Hao +6
Aligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order Optimization
Zhihao Liu, Yifan Wu +4
Conformal Certification of Reasoning Trace Prefixes
Matt Y. Cheung, Ashok Veeraraghavan +2
AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents
Haoran Zhang, Zhaohua Sun
Beyond Consensus: Trace-Level Synthesis in Mixture of Agents
Shreyas Fadnavis, Praitayini Kanakaraj +1
Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning
Tong Ye, Hang Yu +6
PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft
Yuchen Guo, Junli Gong +3
Reasoning Matters: Mitigate Hallucination in Multimodal Large Reasoning Models via Reasoning-Conditioned Preference Optimization
Jiawei Kong, Hao Fang +6
Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages
Shuoming Zhang, Qiuchu Yu +6
Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents
Anany Kotawala
TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning
Chusen Li, Zhou Liu +2
JobBench: Aligning Agent Work With Human Will
Yuetai Li, Yichen Feng +6
CODESKILL: Learning Self-Evolving Skills for Coding Agents
Yanzhou Li, Yiran Zhang +3
Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents
Hao-Hsuan Chen
Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale
Canran Wang, Yuwen Yang +6
Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation
Zixuan Jiang, Yanqiao Zhu +6
KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning
Kun Feng, Ziwei Shan +6
Verifiable Benchmarking of Long-Horizon Spatial Biology
Ian Diks, Harihara Muralidharan +2
Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning
Yang Zhang, Xiaoshuai Sun +6
NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs
Shuaidi Wang, Zhan Zhuang +2
Anchor: Mitigating Artifact Drift in Agent Benchmark Generation
Maksim Ivanov, Abhijay Rana
Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement
Jyotirmoy Nath, Neeraj Kumar +1
Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk
Tim Woydt, Paul-David Zuercher
Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions
Jeongeun Lee, Chanyoung Park +1
Retrying vs Resampling in AI Control
James Lucassen, Adam Kaufman
v2A Policy-Driven Runtime Layer for Agentic LLM Serving
Rui Zhang, Chaeeun Kim +1
Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling
Shijie Cao, Yuan Yuan +1
Plan Before Search: Search Agents Need Plan
Zhipeng Qian, Zihan Liang +6
Measuring Progress Toward AGI: A Cognitive Framework
Ryan Burnell, Yumeya Yamamori +6
From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
Shangding Gu
Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures
Junyoung Park, Sunghwan Park +2
OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories
Yibing Liu, Yangze Liu +5
MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains
Ashutosh Ojha, Vinay Aggarwal +4
ParaTool: Shifting Tool Representations from Context to Parameters
Zekai Yu, Qi Meng +4
Multi-Adapter Representation Interventions via Energy Calibration
Manjiang Yu, Hongji Li +5
Accelerating Constrained Decoding with Token Space Compression
Michael Sullivan, Alexander Koller
Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback
Bowen Wei, Nan Wang +3
Automatic Layer Selection for Hallucination Detection
Xinpeng Wang, William Cao +2
Can Broad Biomedical Knowledge be Contextualized into Scenario-Grounded Propositions?
Qingyuan Zeng, Ziyang Chen +6
MemFail: Stress-Testing Failure Modes of LLM Memory Systems
Ishir Garg, Neel Kolhe +2
Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns
Guni Sharon
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions
Yuxin Chen, Yi Zhang +6
Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning
Yi Wang, Haojie Lu +3
Back to Parsimonious Latents: Learning Task-Centric World Models from Visual Foundations
Minghao Fu, Fan Feng +2
Towards Faithful Agentic XAI: A Verification Method and an Open-World Benchmark for Better Model Faithfulness
Jaechang Kim, Sunung Mun +3
The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models
Dueun Kim, Albert No
MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs
Kevin Wang, Anna Thöni +6
Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
Soeun Kim, Albert No
SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
Yunbo Tang, Chengyi Yang +5
GTA: Generating Long-Horizon Tasks for Web Agents at Scale
Tenghao Huang, Kung-Hsiang Huang +5
EgoBench: An Interactive Egocentric Multimodal Benchmark for Tool-Using Agents
Yunqi Liu, Tong Niu +5
Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning
Zhenyu Cui, Xiangzhong Luo
MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning
Aritra Dutta, Somak Aditya
Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability
Leizhen Zhang, Shuhan Chen +1
Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents
Yunhui Gan, Tan Pan +6
ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure
A. J. Lew, Y. Cao +1
On the Geometry of Games and their Solvers
Yaqi Sun, Julian Ma +1
BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents
Jiahao Huang, Fei Cheng +3
Revealing Algorithmic Deductive Circuits for Logical Reasoning
Phuong Minh Nguyen, Tien Huu Dang +1
Uncertainty Reasoning with Large Language Models for Explainable Disease Diagnosis
Xiaoyang Fan, Yufan Cai +2
HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs
Yansong Ning, Mianpeng Liu +3
OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields
Wanhao Liu, Jiaqing Xie +6
Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces
Chen He, Yuhao Wu +3
A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks
Tomer Keren, Nitay Calderon +4
Harnessing non-adversarial robustness in large language models
Qinghua Zhou, Ellina Aleshina +6
MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning
Yuhao Shen, Lang Cao +5
Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG
Pin Qian, Su Wang +6
Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation
Heng Qu, Yike Liu +5
Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration
Hankyeol Kim, Pilsung Kang
Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations
Matteo Gioele Collu, Riccardo Conte +5
When Should Models Change Their Minds? Contextual Belief Management in Large Language Models
Haoming Xu, Weihong Xu +6
Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies
Wei Zheng, Yang Yan +6
Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering
Mateusz Czyżnikiewicz, Ryszard Tuora +6
DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution
Yunhai Hu, Zining Liu +6
Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability
Pedro Orvalho, Marta Kwiatkowska +2
From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets
Taojie Zhu, Wentao Zhao +6
Bandwidth-Efficient and Privacy-Preserving Edge-Cloud Many-to-Many Speech Translation
Yexing Du, Kaiyuan Liu +5
Teaching Values to Machines: Simulating Human-Like Behavior in LLMs
Asaf Yehudai, Naama Rozen +1
LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
Jung Hyun Lee, June Yong Yang +2
RAISE: RAG Design as an Architecture Search Problem
Zhen Chen, Yibing Liu +4
VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora
Yuting Xu, Jiayi Tian +5
You Live More Than Once: Towards Hierarchical Skill Meta-Evolving
Xujun Li, Kehan Zheng +6
Behind EvoMap: Characterizing a Self-Evolving Agent-to-Agent Collaboration Network
Qiming Ye, Peixain Zhang +3
v2Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments
Yuxin Chen, Xiaodong Cai +6
Towards end-to-end LLM-based censoring-aware survival analysis
Yishu Wei, Hexin Dong +4
OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling
Adam Bawatneh, Sagar Sapkota +3
-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing
Aoxi Liu, Yupeng Chen +6
Generating Robust Portfolios of Optimization Models using Large Language Models
Eleni Straitouri, Cheol Woo Kim +1
Whose Alignment? Comparing LLM Process Alignment Across Diverse Organizational Decision Contexts
Niklas Weller, Emilio Barkett
From XXLTraffic to EvoXXLTraffic: Scaling Traffic Forecasting to Sensor-Evolving Networks
Du Yin, Hao Xue +3
VitalAgent: A Tool-Augmented Agent for Reactive and Proactive Physiological Monitoring over Wearable Health Data
Di Zhu, Yu Yvonne Wu +4
Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World
Yusong Lin, Xinyuan Liang +6
A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test
Camilo Chacón Sartori, José H. García
BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models
Fei Deng, Yanwu Xu +5
Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems
Srini Ramaswamy
Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation
Soumyadeep Jana, Sagar Nishad +1
Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models
Qi Liu, Mingdi Sun +6
AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions
Jingwei Sun, Jianing Zhu +4
Natural Language Query to Configuration for Retrieval Agents
Melissa Z. Pan, Negar Arabzadeh +4
AlphaTransit: Learning to Design City-scale Transit Routes
Bibek Poudel, Sai Swaminathan +1
CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models
Fengze Yang, Bo Yu +3
StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning
Yanfei Zhang, Xu Lin +1
Xetrieval: Mechanistically Explaining Dense Retrieval
Zhixin Cai, Jun Bai +6
Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence
Yanan Wang, Shuaicong Hu +4
SkillGrad: Optimizing Agent Skills Like Gradient Descent
Hanyu Wang, Yifan Lan +3
ProvMind: Provenance-grounded reasoning for materials synthesis
Yiming Zhang, Ryo Tamura +1
Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection
Xiaona Zhou, Muntasir Wahed +3
BatteryMFormer: Multi-level Learning for Battery Degradation Trajectory Forecasting
Ruifeng Tan, Jintao Dong +4
VeriTrace: Evolving Mental Models for Deep Research Agents
Haolang Zhao, Yunbo Long +2