Towards Responsibly Non-Compliant Machines

Marija Slavkovik, Marie Farrell, Louise Dennis, Michael Fisher, Simon Kolker, Emily C. Collins

Jun 10, 2026arXiv:2606.12147v1

cs.AI

#3233of 3489·Artificial Intelligence

#3233 of 3489 · Artificial Intelligence

Tournament Score

1244±49

10501800

22%

Win Rate

Wins

Losses

Matches

Rating

4.5/ 10

Significance6

Rigor3.5

Novelty5

Clarity5.5

Abstract

We consider the problem of engineering autonomous intelligent agents that are capable to responsibly not comply with user requests. We argue that machine non-compliance comes in many different forms, and sketch the issues we should pursue on the road of accomplishing responsibly non-compliant intelligent machines. We anchor responsible non-compliance in justifications for task refusal, pathways to override the non-compliance, as well as careful tracking of security risks and liability transfers.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: "Towards Responsibly Non-Compliant Machines"

1. Core Contribution

This is a position paper that frames the problem of autonomous agent non-compliance as a first-class design concern, distinct from the broader literature on norm obedience, AI alignment, and machine ethics. The main contribution is a conceptual framework that decomposes responsible non-compliance into three pillars: (1) justification for refusal, (2) pathways for human override, and (3) tracking of security risks and liability transfers. The authors propose a "request compliance life-cycle" (Figure 1), a taxonomy of non-compliance reasons (feasibility, safety, normative, efficiency), and three engineering approaches (deliberately, predictably, and learnt non-compliance).

The paper distinguishes itself from prior work on "rebel agents" (Coman & Aha, 2018) and goal rejection mechanisms (Briggs & Scheutz, 2015) by explicitly centering the concept of *compliance* rather than *disobedience*, emphasizing that the agent is responding to an explicit command rather than deviating from goals more broadly. The paper also adds the critical dimension of *refutability* — whether a user should be able to override the machine's refusal — which is a genuinely useful framing.

2. Methodological Rigor

As a position paper, this work is inherently conceptual rather than empirical or formally rigorous. The methodology consists of example-driven analysis (10 scenarios) followed by informal categorization. This approach is appropriate for the venue and paper type, but it limits the depth of contribution.

Several weaknesses stand out:

The taxonomy lacks formal grounding. The categorization into feasibility/safety/normative/efficiency is intuitive but not derived from any principled ontological framework. The distinction between some categories is blurry (e.g., Example 2.5 mixes safety, normativity, and emergency in complex ways).

The life-cycle in Figure 1 is underspecified. It is described as "simplistic" by the authors themselves, and indeed it lacks formal semantics. How deliberation proceeds, what constitutes "sufficient justification," and how override dialogues terminate are all left vague.

Table 1's refutability assignments are asserted rather than argued. Why should "unsafe to user" be refutable but "unsafe to environment" not? The paper gestures at reasons (affected parties cannot consent) but doesn't develop this rigorously. The "upon emergency" qualifier for machine safety and efficiency is especially underspecified.

The three engineering approaches (deliberately, predictably, learnt non-compliance) are sketched at a very high level without concrete architectural proposals, algorithms, or evaluation criteria.

3. Potential Impact

The paper addresses a genuinely important and underexplored problem. As autonomous systems become more prevalent — in healthcare, transportation, domestic assistance, and industry — the question of when and how machines should refuse commands is practically urgent. The paper's framing could influence:

Robot safety standards: The distinction between refutable and non-refutable non-compliance could inform regulatory frameworks (e.g., ISO safety standards for collaborative robots).

Human-robot interaction (HRI): The emphasis on justification and dialogue for overrides connects to active HRI research on trust, transparency, and explanation.

Multi-agent systems: The efficiency-based non-compliance (task priority, delegation to other agents) adds a coordination dimension that is relevant to MAS research.

AI governance and liability: The explicit treatment of liability transfer when a human overrides machine refusal is practically relevant for legal and insurance frameworks around autonomous systems.

However, the impact is limited by the paper's preliminary nature. Without formal models, implemented systems, or empirical validation, the contribution remains at the "agenda-setting" level.

4. Timeliness & Relevance

The paper is highly timely. The AI safety and alignment communities are intensely focused on when AI systems should and should not comply with user instructions — the "refusal" behavior of large language models being a prominent contemporary example (though the paper focuses on embodied agents rather than LLMs). The intersection of autonomy, safety, and human authority is a live concern in robotics (surgical robots, autonomous vehicles, industrial cobots) and in AI policy discussions (EU AI Act, NIST AI Risk Management Framework).

The paper connects to but does not deeply engage with the LLM refusal literature, which is a missed opportunity given how central "overrefusal" and "underrefusal" have become in that community. Drawing parallels or contrasts could have significantly broadened the paper's audience and impact.

5. Strengths & Limitations

Strengths:

Clear problem framing that is distinct from but complementary to AI alignment and machine ethics

The refutability dimension (Table 1) is a genuinely useful conceptual contribution that bridges agent autonomy with human authority

Well-chosen examples that span a wide range of real-world scenarios

The paper correctly identifies that non-compliance without justification is meaningless — this is a strong and well-argued position

Connects technical design to liability and governance questions

Limitations:

Lacks formal models, algorithms, or implementations — all contributions are at the sketch level

The taxonomy, while intuitive, is not exhaustive or formally validated. Edge cases and category overlaps are not addressed

No evaluation framework is proposed for assessing whether a system is "responsibly" non-compliant

Limited engagement with the extensive LLM alignment/refusal literature

The three engineering approaches (Section 8) are underdeveloped — "learnt non-compliance" in particular raises significant safety concerns (randomly selecting commands to evaluate?) that are not discussed

Some claims are underargued (e.g., why exactly should normative non-compliance always be refutable by users?)

The writing contains several typos and grammatical issues ("non-compliment" for "non-compliant," "refute" used where "override" is meant)

6. Additional Observations

The paper would benefit significantly from grounding in formal frameworks — BDI architectures, deontic logic, or argumentation frameworks — that could make the life-cycle and justification structures precise and amenable to verification. The authors' collective expertise (several are prominent in formal verification and agent architectures) suggests this is planned for future work, but its absence weakens the current contribution.

The paper's scope is also somewhat unclear regarding the boundary between "command refusal" and "norm violation" — the authors acknowledge overlap but don't resolve it satisfactorily.

Overall, this is a well-motivated position paper that identifies an important and timely problem, offers useful conceptual distinctions, but remains at a preliminary stage that limits its immediate scientific impact.

Rating:4.5/ 10

Significance 6Rigor 3.5Novelty 5Clarity 5.5

Generated Jun 11, 2026

Comparison History (18)

Lostvs. When Do Data-Driven Systems Exhibit the Capability to Infer?

Paper 2 has higher likely impact due to a more concrete, timely contribution: a formal framework (grounded in statistical learning theory) to operationalize “capability to infer” in the EU AI Act, illustrated with realistic credit-scoring workflows and accompanied by code. This offers actionable guidance for regulators and practitioners, with immediate real-world application in compliance and risk assessment across many deployed data-driven systems. Paper 1 raises important conceptual issues around responsible non-compliance, but appears more agenda-setting and less methodologically specified, making near-term uptake and measurable impact less certain.

gpt-5.2·Jun 11, 2026

Lostvs. Declarative Skills for AI Agents in Knowledge-Grounded Tool-Use Workflows

Paper 1 presents a rigorous empirical and theoretical comparison of agent orchestration paradigms with formal POMDP analysis, experiments across multiple models and retrieval regimes, and actionable findings about declarative vs. imperative agent design. It offers concrete, reproducible methodology and practical insights for building tool-using AI agents. Paper 2, while addressing an important topic (responsible non-compliance), is a position/sketch paper that outlines issues without providing concrete methods, experiments, or evaluations, limiting its immediate scientific impact despite its conceptual relevance.

claude-opus-4-6·Jun 11, 2026

Wonvs. Hierarchical Semantic-Constrained Heterogeneous Graph for Audio-Visual Event Localization

Paper 1 is more likely to have higher scientific impact: it targets a broadly relevant, timely problem in AI safety/governance—how autonomous systems should refuse, justify, and allow override of requests—connecting technical design with accountability, security risk, and liability. This scope can influence multiple fields (AI alignment, HCI, policy, security, robotics) and has direct real-world applicability as LLM/agent deployment accelerates. Paper 2 appears methodologically stronger but is a narrower incremental contribution within audio-visual event localization, with more limited cross-domain reach.

gpt-5.2·Jun 11, 2026

Lostvs. A Lightweight Multi-Agent Framework for Automated Concrete Barrier Design

Paper 1 presents a rigorously tested, open-source framework with empirical results demonstrating high accuracy (98%) and an important finding regarding model scale in constrained tasks. Its methodology and immediate real-world applicability in safety-critical engineering give it a higher tangible scientific impact compared to Paper 2, which presents a conceptual, albeit timely, discussion on AI ethics without empirical validation.

gemini-3.1-pro-preview·Jun 11, 2026

Lostvs. A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography

Paper 1 presents a highly rigorous, empirically validated foundation model trained on massive datasets (over 2.8 million ECGs) with immediate, life-saving real-world applications in cardiovascular care. Its ability to detect both common and rare cardiac conditions demonstrates profound clinical impact and methodological excellence. In contrast, Paper 2 is a conceptual piece on AI safety; while timely and relevant to alignment, it lacks the empirical rigor, concrete technical contributions, and immediate broad-scale real-world utility demonstrated by Paper 1's extensive multi-cohort validation.

gemini-3.1-pro-preview·Jun 11, 2026

Wonvs. Nonslop: A Gamified Experiment in Human-AI Collaborative Writing

Paper 2 addresses a fundamental and highly timely issue in AI safety and alignment: responsible non-compliance. Its framework for task refusal, security, and liability has broad implications across AI engineering, ethics, and policy, offering wider real-world impact and broader relevance across fields than Paper 1's specific HCI experiment on creative writing.

gemini-3.1-pro-preview·Jun 11, 2026

Wonvs. Agent Economics: An Entropy-Controlled Pluralistic Alignment Framework for Preventing Artificial Hivemind in Autonomous Agents

Paper 1 targets a timely, broadly relevant alignment problem—responsible refusal/non-compliance—directly connected to real-world deployment, safety, policy, and liability. Its framing (justifications, override pathways, risk tracking, liability transfer) maps to concrete governance and engineering requirements likely to influence multiple fields (AI safety, HCI, law, security). Paper 2 is more speculative and complex, with many loosely specified components (entropy control, ToM, verifiable kernel) and an evaluation plan centered on simulations; methodological rigor and feasibility are less clear, potentially limiting near-term impact despite interesting ideas.

gpt-5.2·Jun 11, 2026

Lostvs. Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

Paper 1 addresses a critical technical bottleneck in LLMs (long-context efficiency) with a concrete, empirically validated methodology. Its practical applications and methodological rigor offer immediate utility. In contrast, Paper 2 is a conceptual position paper; while discussing an important safety topic, it lacks the concrete technical contributions likely to drive immediate and measurable scientific impact.

gemini-3.1-pro-preview·Jun 11, 2026

Lostvs. Embodied-BenchClaw: An Autonomous Multi-Agent System for Embodied Spatial Intelligence Benchmark Construction

Paper 1 has higher likely scientific impact: it proposes a concrete, novel multi-agent system that automates and continuously updates embodied AI benchmark construction, with an extensible skill library, QC mechanisms, and instantiations across multiple embodied domains. It includes experimental validation (human/judge assessment, consistency checks, cost and ablations), supporting methodological rigor and immediate usability by the community. Its outputs can broadly accelerate evaluation and progress in embodied AI/robotics. Paper 2 is timely and important conceptually, but reads more like a position/agenda with less technical novelty and empirical grounding, so near-term measurable impact is likely lower.

gpt-5.2·Jun 11, 2026

Wonvs. PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents

Paper 2 addresses a fundamental and broadly applicable challenge in AI safety and ethics—how autonomous agents should handle non-compliance with user requests. This touches core issues in AI alignment, safety, and governance that affect the entire field of AI development. Its conceptual framework (justifications, overrides, liability transfers) has potential to influence policy, regulations, and system design across many domains. Paper 1, while practical and useful, is a narrowly scoped engineering tool for AI coding assistants with limited evaluation (single-user self-study), reducing its broader scientific impact.

claude-opus-4-6·Jun 11, 2026

#3233of 3489·Artificial Intelligence

#3233 of 3489 · Artificial Intelligence

Tournament Score

1244±49

10501800

22%

Win Rate

Wins

Losses

Matches

Rating

4.5/ 10

Significance6

Rigor3.5

Novelty5

Clarity5.5