Marija Slavkovik, Marie Farrell, Louise Dennis, Michael Fisher, Simon Kolker, Emily C. Collins
We consider the problem of engineering autonomous intelligent agents that are capable to responsibly not comply with user requests. We argue that machine non-compliance comes in many different forms, and sketch the issues we should pursue on the road of accomplishing responsibly non-compliant intelligent machines. We anchor responsible non-compliance in justifications for task refusal, pathways to override the non-compliance, as well as careful tracking of security risks and liability transfers.
This is a position paper that frames the problem of autonomous agent non-compliance as a first-class design concern, distinct from the broader literature on norm obedience, AI alignment, and machine ethics. The main contribution is a conceptual framework that decomposes responsible non-compliance into three pillars: (1) justification for refusal, (2) pathways for human override, and (3) tracking of security risks and liability transfers. The authors propose a "request compliance life-cycle" (Figure 1), a taxonomy of non-compliance reasons (feasibility, safety, normative, efficiency), and three engineering approaches (deliberately, predictably, and learnt non-compliance).
The paper distinguishes itself from prior work on "rebel agents" (Coman & Aha, 2018) and goal rejection mechanisms (Briggs & Scheutz, 2015) by explicitly centering the concept of *compliance* rather than *disobedience*, emphasizing that the agent is responding to an explicit command rather than deviating from goals more broadly. The paper also adds the critical dimension of *refutability* — whether a user should be able to override the machine's refusal — which is a genuinely useful framing.
As a position paper, this work is inherently conceptual rather than empirical or formally rigorous. The methodology consists of example-driven analysis (10 scenarios) followed by informal categorization. This approach is appropriate for the venue and paper type, but it limits the depth of contribution.
Several weaknesses stand out:
The paper addresses a genuinely important and underexplored problem. As autonomous systems become more prevalent — in healthcare, transportation, domestic assistance, and industry — the question of when and how machines should refuse commands is practically urgent. The paper's framing could influence:
However, the impact is limited by the paper's preliminary nature. Without formal models, implemented systems, or empirical validation, the contribution remains at the "agenda-setting" level.
The paper is highly timely. The AI safety and alignment communities are intensely focused on when AI systems should and should not comply with user instructions — the "refusal" behavior of large language models being a prominent contemporary example (though the paper focuses on embodied agents rather than LLMs). The intersection of autonomy, safety, and human authority is a live concern in robotics (surgical robots, autonomous vehicles, industrial cobots) and in AI policy discussions (EU AI Act, NIST AI Risk Management Framework).
The paper connects to but does not deeply engage with the LLM refusal literature, which is a missed opportunity given how central "overrefusal" and "underrefusal" have become in that community. Drawing parallels or contrasts could have significantly broadened the paper's audience and impact.
The paper would benefit significantly from grounding in formal frameworks — BDI architectures, deontic logic, or argumentation frameworks — that could make the life-cycle and justification structures precise and amenable to verification. The authors' collective expertise (several are prominent in formal verification and agent architectures) suggests this is planned for future work, but its absence weakens the current contribution.
The paper's scope is also somewhat unclear regarding the boundary between "command refusal" and "norm violation" — the authors acknowledge overlap but don't resolve it satisfactorily.
Overall, this is a well-motivated position paper that identifies an important and timely problem, offers useful conceptual distinctions, but remains at a preliminary stage that limits its immediate scientific impact.
Generated Jun 11, 2026
Paper 2 has higher likely impact due to a more concrete, timely contribution: a formal framework (grounded in statistical learning theory) to operationalize “capability to infer” in the EU AI Act, illustrated with realistic credit-scoring workflows and accompanied by code. This offers actionable guidance for regulators and practitioners, with immediate real-world application in compliance and risk assessment across many deployed data-driven systems. Paper 1 raises important conceptual issues around responsible non-compliance, but appears more agenda-setting and less methodologically specified, making near-term uptake and measurable impact less certain.
Paper 1 presents a rigorous empirical and theoretical comparison of agent orchestration paradigms with formal POMDP analysis, experiments across multiple models and retrieval regimes, and actionable findings about declarative vs. imperative agent design. It offers concrete, reproducible methodology and practical insights for building tool-using AI agents. Paper 2, while addressing an important topic (responsible non-compliance), is a position/sketch paper that outlines issues without providing concrete methods, experiments, or evaluations, limiting its immediate scientific impact despite its conceptual relevance.
Paper 1 is more likely to have higher scientific impact: it targets a broadly relevant, timely problem in AI safety/governance—how autonomous systems should refuse, justify, and allow override of requests—connecting technical design with accountability, security risk, and liability. This scope can influence multiple fields (AI alignment, HCI, policy, security, robotics) and has direct real-world applicability as LLM/agent deployment accelerates. Paper 2 appears methodologically stronger but is a narrower incremental contribution within audio-visual event localization, with more limited cross-domain reach.
Paper 1 presents a rigorously tested, open-source framework with empirical results demonstrating high accuracy (98%) and an important finding regarding model scale in constrained tasks. Its methodology and immediate real-world applicability in safety-critical engineering give it a higher tangible scientific impact compared to Paper 2, which presents a conceptual, albeit timely, discussion on AI ethics without empirical validation.
Paper 1 presents a highly rigorous, empirically validated foundation model trained on massive datasets (over 2.8 million ECGs) with immediate, life-saving real-world applications in cardiovascular care. Its ability to detect both common and rare cardiac conditions demonstrates profound clinical impact and methodological excellence. In contrast, Paper 2 is a conceptual piece on AI safety; while timely and relevant to alignment, it lacks the empirical rigor, concrete technical contributions, and immediate broad-scale real-world utility demonstrated by Paper 1's extensive multi-cohort validation.
Paper 2 addresses a fundamental and highly timely issue in AI safety and alignment: responsible non-compliance. Its framework for task refusal, security, and liability has broad implications across AI engineering, ethics, and policy, offering wider real-world impact and broader relevance across fields than Paper 1's specific HCI experiment on creative writing.
Paper 1 targets a timely, broadly relevant alignment problem—responsible refusal/non-compliance—directly connected to real-world deployment, safety, policy, and liability. Its framing (justifications, override pathways, risk tracking, liability transfer) maps to concrete governance and engineering requirements likely to influence multiple fields (AI safety, HCI, law, security). Paper 2 is more speculative and complex, with many loosely specified components (entropy control, ToM, verifiable kernel) and an evaluation plan centered on simulations; methodological rigor and feasibility are less clear, potentially limiting near-term impact despite interesting ideas.
Paper 1 addresses a critical technical bottleneck in LLMs (long-context efficiency) with a concrete, empirically validated methodology. Its practical applications and methodological rigor offer immediate utility. In contrast, Paper 2 is a conceptual position paper; while discussing an important safety topic, it lacks the concrete technical contributions likely to drive immediate and measurable scientific impact.
Paper 1 has higher likely scientific impact: it proposes a concrete, novel multi-agent system that automates and continuously updates embodied AI benchmark construction, with an extensible skill library, QC mechanisms, and instantiations across multiple embodied domains. It includes experimental validation (human/judge assessment, consistency checks, cost and ablations), supporting methodological rigor and immediate usability by the community. Its outputs can broadly accelerate evaluation and progress in embodied AI/robotics. Paper 2 is timely and important conceptually, but reads more like a position/agenda with less technical novelty and empirical grounding, so near-term measurable impact is likely lower.
Paper 2 addresses a fundamental and broadly applicable challenge in AI safety and ethics—how autonomous agents should handle non-compliance with user requests. This touches core issues in AI alignment, safety, and governance that affect the entire field of AI development. Its conceptual framework (justifications, overrides, liability transfers) has potential to influence policy, regulations, and system design across many domains. Paper 1, while practical and useful, is a narrowly scoped engineering tool for AI coding assistants with limited evaluation (single-user self-study), reducing its broader scientific impact.