Artificial Intelligence Paper Rankings
AI-estimated scientific impact ranking of the latest arXiv Artificial Intelligence preprints. Methodology New: General Relativity
Sign up for free to unlock all papers &
Simulating clinical interventions with a generative multimodal model of human physiology
Guy Lutsker, Gal Sapir +6
End-to-end autonomous scientific discovery on a real optical platform
Shuxing Yang, Fujia Chen +6
MIMIC: A Generative Multimodal Foundation Model for Biomolecules
Siavash Golkar, Jake Kovalic +6
Foundation Models to Unlock Real-World Evidence from Nationwide Medical Claims
Fan Ma, Yuntian Liu +6
Machine Collective Intelligence for Explainable Scientific Discovery
Gyoung S. Na, Chanyoung Park
AI scientists produce results without reasoning scientifically
Martiño Ríos-García, Nawaf Alampara +6
Towards a General Intelligence and Interface for Wearable Health Data
Girish Narayanswamy, Maxwell A. Xu +6
Generative structure search for efficient and diverse discovery of molecular and crystal structures
Yifang Qin, Yu Shi +4
A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and Thermodynamics
Djamel Bouchaffra, Faycal Ykhlef +2
IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures
David Gringras
AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot
Joydeep Biswas, Sheila Schoepp +6
Hodoscope: Unsupervised Monitoring for AI Misbehaviors
Ziqian Zhong, Shashwat Saxena +1
Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models
Inderjeet Nair, Jie Ruan +1
v2Emotion Concepts and their Function in a Large Language Model
Nicholas Sofroniew, Isaac Kauvar +6
SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment
Joseph Breda, Fadi Yousif +6
Unbiased Prevalence Estimation with Multicalibrated LLMs
Fridolin Linder, Thomas Leeper +4
Heterogeneous Scientific Foundation Model Collaboration
Zihao Li, Jiaru Zou +6
Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics
Moritz Firsching, Paul Lezeau +6
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
Michael Cuccarese
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?
Mohamed Elfeki, Tu Trinh +6
Containment Verification: AI Safety Guarantees Independent of Alignment
Royce Moon, Lav R. Varshney
Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment
Zhiqin Yang, Yonggang Zhang +4
Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation
Jacob Dang, Brian Y. Xie +1
Brief chatbot interactions produce lasting changes in human moral values
Yue Teng, Qianer Zhong +3
Bounding the Black Box: A Statistical Certification Framework for AI Risk Regulation
Natan Levy, Gadi Perl
The Power of Power Law: Asymmetry Enables Compositional Reasoning
Zixuan Wang, Xingyu Dang +2
Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs
Charles Ye, Bo Yuan +1
BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature
Jiaxian Yan, Jintao Zhu +6
Auditable Agents
Yi Nian, Aojie Yuan +3
KISS - Knowledge Infrastructure for Scientific Simulation: A Scaffolding for Agentic Earth Science
Ziwei Li, Liujun Zhu +6
Using large language models for embodied planning introduces systematic safety risks
Tao Zhang, Kaixian Qu +5
v2Context Over Content: Exposing Evaluation Faking in Automated Judges
Manan Gupta, Inderjeet Nair +2
Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models
Chashi Mahiul Islam, Alan Villarreal +3
Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search
Sarah Martinson, Michael P. Brenner +4
Discovering Novel LLM Experts via Task-Capability Coevolution
Andrew Dai, Boris Meinardus +3
Towards Understanding Specification Gaming in Reasoning Models
Kei Nishimura-Gasparian, Robert McCarthy +1
The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives
Haileleol Tibebu
How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles
Chenchen Kuai, Jiwan Jiang +6
RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair
Jagadeesh Rachapudi, Pranav Singh +3
How LLMs Are Persuaded: A Few Attention Heads, Rerouted
Xiangkun Sun, Lingkai Kong +3
Process Reward Agents for Steering Knowledge-Intensive Reasoning
Jiwoong Sohn, Tomasz Sternal +3
Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching
Aihua Li
Participatory provenance as representational auditing for AI-mediated public consultation
Sachit Mahajan
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
Meng Chu, Xuan Billy Zhang +6
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Haozhe Wang, Cong Wei +4
Conditional Attribute Estimation with Autoregressive Sequence Models
Erica Stutz, Giacomo Marino +3
Detecting Safety Violations Across Many Agent Traces
Adam Stein, Davis Brown +3
OLLM: Options-based Large Language Models
Shashank Sharma, Janina Hoffmann +1
Model Spec Midtraining: Improving How Alignment Training Generalizes
Chloe Li, Sara Price +2
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Guanting Dong, Junting Lu +6
How Adversarial Environments Mislead Agentic AI?
Zhonghao Zhan, Huichi Zhou +4
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Shaden Alshammari, Kevin Wen +6
The Capability Paradox: How Smarter Auditors Make Multi-Agent Systems Less Secure
Qiqi Liu, Thorsten Holz +2
Advancing Mathematics Research with AI-Driven Formal Proof Search
George Tsoukalas, Anton Kovsharov +6
Hidden Biases in Conditioning Autoregressive Models
Francois Pachet, Pierre Roy
DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
Zhaorun Chen, Xun Liu +6
When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel
Wenkai Li, Fan Yang +3
ResearchEVO: An End-to-End Framework for Automated Scientific Discovery and Documentation
Zhe Zhao, Haibin Wen +5
Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicine
Peisong Zhang, Manqiang Peng +6
Characterizing Model-Native Skills
Feiyang Kang, Mahavir Dabas +2
EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics
Shuyue Stella Li, Rui Xin +6
Introspection Adapters: Training LLMs to Report Their Learned Behaviors
Keshav Shenoy, Li Yang +5
v2Hallucination as Exploit: Evidence-Carrying Multimodal Agents
Guijia Zhang, Hao Zheng +1
CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators
Nicolás Astorga, Anita Kriz +1
Causal Bias Detection in Generative Artifical Intelligence
Drago Plecko
Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination
Qiyao Liang, Risto Miikkulainen +1
The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations
Rania Elbadry, Ahmed Heakl +5
Recursive Multi-Agent Systems
Xiyuan Yang, Jiaru Zou +6
Fusion-fission forecasts when AI will shift to undesirable behavior
Neil F. Johnson, Frank Yingjie Huo
Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery
Xinzhe Yuan, Zhuo Chen +5
Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies
Chris Schneider, Philipp Schoenegger +1
SWE-chat: Coding Agent Interactions From Real Users in the Wild
Joachim Baumann, Vishakh Padmakumar +4
Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning
Sixing Chen, Ji-An Li +4
SELFDOUBT: Uncertainty Quantification for Reasoning LLMs via the Hedge-to-Verify Ratio
Satwik Pandey, Suresh Raghu +1
LACE: Lattice Attention for Cross-thread Exploration
Yang Li, Zirui Zhang +2
Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment
Tanav Singh Bajaj, Nikhil Singh +2
Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization
Yu Li, Sizhe Tang +1
A Versatile AI Agent for Rare Disease Diagnosis and Risk Gene Prioritization
Tianyu Liu, Wangjie Zheng +6
Agentic Discovery of Exchange-Correlation Density Functionals
Titouan Duston, Jiashu Liang +6
JURY-RL: Votes Propose, Proofs Dispose for Label-Free RLVR
Xinjie Chen, Biao Fu +5
Transferable Human Mobility Network Reconstruction with neuroGravity
Jinming Yang, Shaoyu Huang +5
From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models
Ling Shi, Xinwei Wu +6
CIVeX: Causal Intervention Verification for Language Agents
Fabio Rovai
Bias by Necessity: Impossibility Theorems for Sequential Processing with Convergent AI and Human Validation
Jikun Wu, Dongxin Guo +1
PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations
Yang Zhang, Jiangyuan Zhao +6
QED: An Open-Source Multi-Agent System for Generating Mathematical Proofs on Open Problems
Chenyang An, Qihao Ye +2
v2TRACE: Trajectory Correction from Cross-layer Evidence for Hallucination Reduction
Tej Sanibh Ranade
SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution
Jiachen Jiang, Huminhao Zhu +1
Data Language Models: A New Foundation Model Class for Tabular Data
Eda Erol, Giuliano Pezzoli +1
LLM Reasoning Is Latent, Not the Chain of Thought
Wenshuo Wang
ASH: Agents that Self-Hone via Embodied Learning
Benjamin Schneider, Xavier Schneider +2
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
Yucheng Shi, Zhenwen Liang +4
MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning
Maria Nesterova, Mikhail Kolosov +6
Process Matters more than Output for Distinguishing Humans from Machines
Milena Rmus, Mathew D. Hardy +2
Attributing Emergence in Million-Agent Systems
Ling Tang, Jilin Mei +6
PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments
Ruoqi Liu, Imran Q. Mohiuddin +6
Rollout Cards: A Reproducibility Standard for Agent Research
Charlie Masters, Ziyuan Liu +1
Alignment Imprint: Zero-Shot AI-Generated Text Detection via Provable Preference Discrepancy
Junxi Wu, Kailin Huang +5
Verifiable Process Rewards for Agentic Reasoning
Huining Yuan, Zelai Xu +6
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
Ahmad Al-Tawaha, Shangding Gu +3
Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation
Jinkun Liu, Haohan Chi +6
Orchard: An Open-Source Agentic Modeling Framework
Baolin Peng, Wenlin Yao +6
State-Centric Decision Process
Sungheon Jeong, Ryozo Masukawa +3
Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results
Benjamin Kohler, David Zollikofer +3
Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover
Jui-Hui Chung, Hongzhou Lin +3
Distribution-Aware Algorithm Design with LLM Agents
Saharsh Koganti, Priyadarsi Mishra +2
Unlocking LLM Creativity in Science through Analogical Reasoning
Andrew Shen, Shaul Druckmann +1
What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code
Yuze Zhao, Junpeng Fang +6
Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners
Botos Csaba, Sreejan Kumar +6
Geometric Metrics for MoE Specialization: From Fisher Information to Early Failure Detection
Dongxin Guo, Jikun Wu +1
From Prompts to Protocols: An AI Agent for Laboratory Automation
Angelos Angelopoulos, James F. Cahoon +1
Forge: Quality-Aware Reinforcement Learning for NP-Hard Optimization in LLMs
Xiaozhe Li, Xinyu Fang +6
Self-Correction as Feedback Control: Error Dynamics, Stability Thresholds, and Prompt Interventions in LLMs
Aofan Liu, Jingxiang Meng
v2AI Co-Mathematician: Accelerating Mathematicians with Agentic AI
Daniel Zheng, Ingrid von Glehn +6
SciCore-Mol: Augmenting Large Language Models with Pluggable Molecular Cognition Modules
Yuxuan Chen, Changwei Lv +6
To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning
Nevena Lazić, Liam Fowl +2
D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery
Hanane Nour Moussa, Yifei Li +6
v2Reasoning Can Be Restored by Correcting a Few Decision Tokens
Changshuo Shen, Leheng Sheng +3
The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?
Zhaoyang Zhang, Run Shao +5
Policy-Invisible Violations in LLM-Based Agents
Jie Wu, Ming Gong
Seirênes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning
Chi Zhang, Haibo Qiu +4
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction
Vardhan Dongre, Joseph Hsieh +4
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
Tianle Wang, Zhaoyang Wang +5
SGA-MCTS: Decoupling Planning from Execution via Training-Free Atomic Experience Retrieval
Xin Xie, Dongyun Xue +6
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning
Mingkai Deng, Jinyu Hou +5
FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean
Jordan Meadows, Lan Zhang +1
Reasoning Fails Where Step Flow Breaks
Xiaoyu Xu, Yulan Pan +5
Reasoning Structure Matters for Safety Alignment of Reasoning Models
Yeonjun In, Wonjoong Kim +2
SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning
Tianshi Zheng, Rui Wang +3
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
Alberto G. Rodríguez Salgado
To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands
Fangyi Yu, Nabeel Seedat +2
MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models
Jason Z Wang
PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models
Sridhar Mahadevan
CT Open: An Open-Access, Uncontaminated, Live Platform for the Open Challenge of Clinical Trial Outcome Prediction
Jianyou Wang, Youze Zheng +6
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
Bowen Ye, Rang Li +6
SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention
Jiahao Li, Jiayi Dong +4
Agentick: A Unified Benchmark for General Sequential Decision-Making Agents
Roger Creus Castanyer, Pablo Samuel Castro +1
Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models
Sharan Ramjee
The Evaluation Differential: When Frontier AI Models Recognise They Are Being Tested
Varad Vishwarupe, Nigel Shadbolt +2
Missingness-MDPs: Bridging the Theory of Missing Data and POMDPs
Joshua Wendland, Markel Zubia +6
CLEF: EEG Foundation Model for Learning Clinical Semantics
Peng Cao, Ali Mirzazadeh +3
A Foundation Model for Zero-Shot Logical Rule Induction
Yin Jun Phua
Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning
Kinhei Lee, Peiyuan Jing +6
TRIAGE: Evaluating Prospective Metacognitive Control in LLMs under Resource Constraints
Zabir Al Nazi, Shubhashis Roy Dipta
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
Hao Wang, Hanchen Li +4
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists
Yujun Wu, Dongxu Zhang +6
v2FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression
Namyoon Lee, Yongjune Kim
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
Qihan Ren, Peng Wang +6
Discovering Agentic Safety Specifications from 1-Bit Danger Signals
Víctor Gallego
MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents
Haonan Li, Tianjun Sun +2
Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling
Yujie Chen, Tailai Chen +5
Fully Open Meditron: An Auditable Pipeline for Clinical LLMs
Xavier Theimer-Lienhard, Mushtaha El-Amin +6
Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints
Jiayu Li, Enpei Zhang +3
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems
Ye Yu, Heming Liu +4
Poly-EPO: Training Exploratory Reasoning Models
Ifdita Hasan Orney, Jubayer Ibn Hamid +6
v2Geometry over Density: Few-Shot Cross-Domain OOD Detection
Shawn Li, You Qin +5
v2SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
Ashima Suvarna, Kendrick Phan +3
Multi-Agent Orchestration for High-Throughput Materials Screening on a Leadership-Class System
Thang Duc Pham, Harikrishna Tummalapalli +6
Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense
Kerri Prinos, Lilianne Brush +6
From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation
Mengdie Flora Wang, Haochen Xie +6
When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees
Dongxin Guo, Jikun Wu +1
Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents
Xing Zhang, Guanghui Wang +5
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory
Mingxi Zou, Zhihan Guo +6
Can Large Language Models Reinvent Foundational Algorithms?
Jian Zhao, Haoren Luo +4
The Compliance Trap: How Structural Constraints Degrade Frontier AI Metacognition Under Adversarial Pressure
Rahul Kumar
CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors
Yubin Kim, Salman Rahman +6
XDecomposer: Learning Prior-Free Set Decomposition for Multiphase X-ray Diffraction
Hanyu Gao, Bin Cao +3
AIBuildAI: An AI Agent for Automatically Building AI Models
Ruiyi Zhang, Peijia Qin +3
State Contamination in Memory-Augmented LLM Agents
Yian Wang, Agam Goyal +2
The Query Channel: Information-Theoretic Limits of Masking-Based Explanations
Erciyes Karakaya, Ozgur Ercetin
Imperfect World Models are Exploitable
Logan Mondal Bhamidipaty, Esmeralda S. Whitammer +3
v2IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling
Zhaomeng Zhou, Lan Zhang +4
Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
Tianyang Han, Hengyu Shi +4
v2Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories
Peiyang Liu, Zhirui Chen +5
Quantifying and Understanding Uncertainty in Large Reasoning Models
Yangyi Li, Chenxu Zhao +1
Generative Recursive Reasoning
Junyeob Baek, Mingyu Jo +4
v2Geometric Routing Enables Causal Expert Control in Mixture of Experts
Ivan Ternovtsii, Yurii Bilak
Problem Reductions at Scale: Agentic Integration of Computationally Hard Problems
Xi-Wei Pan, Shi-Wen An +1
Self-Programmed Execution for Language-Model Agents
Luke J. O'Connor
FVD: Inference-Time Alignment of Diffusion Models via Fleming-Viot Resampling
Shivanshu Shekhar, Sagnik Mukherjee +2
OptimusKG: Unifying biomedical knowledge in a modern multimodal graph
Lucas Vittor, Ayush Noori +6
The Two Boundaries: Why Behavioral AI Governance Fails Structurally
Alan L. McCann
From Holo Pockets to Electron Density: GPT-style Drug Design with Density
Jiahao Chen, Letian Gao +5
Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization
Yizhe Chi, Deyao Hong +6
PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play
Roger Creus Castanyer, Geoffrey Bradway +4
Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation
Ye Yu, Xiaopeng Yuan +4
On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment
Bo Yin, Qi Li +1
-mem: Efficient Online Memory for Large Language Models
Jingdi Lei, Di Zhang +6
ECG-WM: A Physiology-Informed ECG World Model for Clinical Intervention Simulation
Zhikang Chen, Yue Wang +5
Von Neumann Networks
Shekhar S. Chandra
Ex Ante Evaluation of AI-Induced Idea Diversity Collapse
Nafis Saami Azad, Raiyan Abdul Baten
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
Ali Hatamizadeh, Yejin Choi +1
NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning
Haoran Lu, Luyang Fang +2
Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction
Jiahe Guo, Xiangran Guo +6
Quantifying the human visual exposome with vision language models
Christian Rominger, Andreas R. Schwerdtfeger +6
Contextual Agentic Memory is a Memo, Not True Memory
Binyan Xu, Xilin Dai +1
Reinforcing VLAs in Task-Agnostic World Models
Yucen Wang, Rui Yu +6
Breaking : Cooperative Policy Optimization Improves Diverse LLM Reasoning
Haoxuan Chen, Tianming Liang +2
MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs
Junwei Liao, Haoting Shi +6
Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models
Haoxiang Wang, Da Yu +1