OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

Chiara Maria Russo, Simone Carnemolla, Simone Palazzo, Daniela Giordano, Concetto Spampinato, Matteo Pennisi

May 18, 2026

arXiv:2605.18481v1 PDF

cs.AI(primary)

#1216of 2292·Artificial Intelligence

#1216 of 2292 · Artificial Intelligence

Tournament Score

1405±43

10501800

55%

Win Rate

Wins

Losses

Matches

Rating

5.5/ 10

Significance5.5

Rigor4.5

Novelty6

Clarity6.5

Tournament Score

1405±43

10501800

55%

Win Rate

Wins

Losses

Matches

Rating

5.5/ 10

Significance

Rigor

Novelty

Clarity

Abstract

Interpreting the decisions of deep image classifiers remains challenging, particularly in black-box settings where model internals are inaccessible. We introduce OCCAM, a framework for open-set causal concept explanation and ontology induction in vision models. OCCAM discovers visual concepts in an open-set manner, localizes them via text-guided segmentation, and performs object-level interventions by removing concepts to measure changes in class confidence, estimating each concept's causal contribution. Beyond local explanations, OCCAM aggregates interventional evidence across a dataset to induce a structured concept ontology that captures how classifiers globally organize visual concepts. Reasoning over this ontology reveals consistent dependencies between concepts, exposes latent causal relations, and uncovers systematic model biases. Experiments on Broden and ImageNet-S across multiple classifiers show that OCCAM improves explanation quality in open-set black-box settings while providing richer global insight than per-image attribution methods.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: OCCAM

1. Core Contribution

OCCAM introduces a framework for explaining black-box vision classifiers through three interconnected mechanisms: (1) open-set concept discovery via multimodal LLMs, (2) causal contribution estimation through input-level interventions (concept removal via inpainting), and (3) ontology induction by aggregating interventional evidence across a dataset into a structured knowledge graph. The key novelty lies in combining these three elements — open-set discovery, interventional causality, and global ontology construction — into a unified pipeline that operates under strict black-box constraints.

The problem addressed is genuine: most concept-based explanation methods require either white-box access, predefined concept vocabularies, or both. OCCAM removes both constraints simultaneously, which is a meaningful practical advance for deployed systems and API-accessible models.

2. Methodological Rigor

Strengths in methodology:

The formalization is clean, with well-defined operators (Φ, Ψ, R) for concept proposal, grounding, and removal.

The causal pruning approach is intuitive and grounded in interventionist causality — removing a concept and measuring confidence change is a principled way to estimate necessity.

FID scores (0.76 and 1.20) are reported to argue that inpainting artifacts don't confound results, which is a thoughtful control.

Multiple architectures (CNNs, ViTs, CLIP variants, SigLIP) are evaluated, showing generality.

Weaknesses in methodology:

The "causal" framing is somewhat oversold. Single-concept removal measures marginal necessity but doesn't account for concept interactions, confounding between spatially overlapping concepts, or the fact that inpainting introduces new visual content (not a true "do-operator" intervention). The authors acknowledge this implicitly but don't discuss limitations of the causal interpretation rigorously.

The concept grounding evaluation (Table 2) requires Grad-CAM modulation to convert binary masks into continuous maps for fair comparison — this undermines the pure black-box claim, even if used only for evaluation. The results are also mixed: OCCAM outperforms on ResNet18 but underperforms CE-FAM on ViT-B/16 for all three metrics.

The ontology evaluation (Table 3) relies on LLM-as-judge and 30 non-expert MTurk workers rating on a 1-5 scale across only 5 classes. The differences between conditions are small (4.23 vs 4.78), sample sizes are minimal, and no statistical significance tests are reported for this evaluation. The authors themselves note these "should not be interpreted as a formal user study."

The causal pruning comparison (Table 1) is limited to CE-FAM only, as other methods lack per-concept spatial grounding. This narrows the competitive landscape.

3. Potential Impact

The framework addresses a real need in deployed AI systems where model internals are inaccessible. The ontology induction component is conceptually appealing — moving from per-image explanations to a global, queryable knowledge structure could be valuable for model auditing, bias detection, and regulatory compliance.

The extension to multimodal classifiers (Table 4) demonstrates breadth, and the finding that SigLIP distributes reliance more broadly than CLIP variants is an interesting architectural insight.

However, practical impact may be limited by computational cost (running an MLLM + SAM3 + LaMa inpainting per concept per image is expensive) and by the reliance on the quality of foundation model components. The explanation quality is fundamentally bounded by the MLLM's ability to propose relevant concepts and SAM3's segmentation accuracy.

4. Timeliness & Relevance

The paper is timely in leveraging the current ecosystem of foundation models (LLMs, SAM, inpainting models) for explainability. The black-box constraint is increasingly relevant as models are deployed behind APIs. The ontology induction angle connects XAI to knowledge representation, which is an underexplored direction.

However, the dependence on multiple large foundation models (Gemma 27B, SAM3, LaMa) means the "explainer" is arguably more complex than many models being explained, raising questions about the explanation's own transparency and reliability.

5. Strengths & Limitations

Key Strengths:

Principled combination of open-set discovery, causal intervention, and global ontology construction under black-box constraints

Clean formalization and modular pipeline design

Evaluation across diverse architectures including VLMs

The ontology concept genuinely adds value beyond per-image methods

FID-based artifact control is a thoughtful addition

Notable Limitations:

Quantitative improvements are modest ("moderate improvements" as the authors themselves state)

The ontology evaluation is underpowered: 5 classes, small score differences, no significance tests

The causal claims lack discussion of confounders (spatial overlap between concepts, inpainting artifacts introducing new features, interaction effects)

Heavy reliance on foundation model quality without ablation of component choices

No computational cost analysis despite the pipeline involving multiple large models

The ontology schema and its formal properties are deferred to supplementary material, making the contribution harder to evaluate

Reproducibility concerns: specific prompt engineering for the MLLM, threshold choices, and semantic alignment procedures could significantly affect results

Additional Observations

The volleyball example (Fig. 2) where classifiers identify "hair" and "smiling faces" as important for volleyball classification is presented as revealing model biases — but this could also reflect limitations of the MLLM's concept proposals rather than genuine classifier behavior. Disentangling the two is not addressed.

The paper is from a single institution (University of Catania) with six authors, suggesting a focused effort. The writing is generally clear but occasionally conflates evaluation-only use of white-box methods with the black-box claims.

Rating:5.5/ 10

Significance 5.5Rigor 4.5Novelty 6Clarity 6.5

Generated May 19, 2026

Comparison History (20)

vs. Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use

claude-opus-4.65/20/2026

OCCAM addresses the broadly important problem of explainability in deep learning with a novel combination of open-set concept discovery, causal interventions, and ontology induction for black-box vision models. It has wider applicability across computer vision, XAI, and model auditing, with empirical validation on standard benchmarks. Paper 2 offers an elegant formalization of trust calibration as preference learning but is more niche, primarily theoretical, and lacks empirical validation. OCCAM's methodological contributions and breadth of impact give it higher potential scientific influence.

vs. Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models

gemini-3.15/20/2026

Paper 1 addresses the urgent and highly relevant issue of safety and vulnerabilities in emerging Large Reasoning Models (LRMs). Given the rapid deployment of reasoning LLMs, understanding and mitigating their jailbreak vulnerabilities is critical. The novel use of attention-guided reinforcement learning provides a rigorous approach to AI safety, likely leading to broader immediate impact and real-world application in securing foundational models compared to the more mature field of vision model explainability addressed in Paper 2.

vs. Evaluating the Utility of Personal Health Records in Personalized Health AI

claude-opus-4.65/20/2026

Paper 1 addresses the highly timely intersection of LLMs and personalized healthcare, with a large-scale evaluation (2,257 queries, 1,945 PHRs) and both automated and clinician ratings. It introduces a novel evaluation framework for PHR-specific error modes, has direct real-world healthcare applications affecting millions of patients, and tackles the critical challenge of making complex health records actionable. Paper 2 contributes meaningfully to XAI with causal concept explanations, but explainability methods face a more crowded field. Paper 1's broader societal impact potential and timeliness in the LLM-health space give it the edge.

vs. AgentNLQ: A General-Purpose Agent for Natural Language to SQL

gemini-3.15/20/2026

Paper 2 tackles a fundamental and pressing challenge in AI—interpreting black-box vision models via causal concept explanation and ontology induction. Its focus on open-set discovery and causal intervention offers broad implications for AI safety, trust, and explainability across multiple domains. In contrast, while Paper 1 presents a highly practical and commercially valuable application (NL2SQL), its multi-agent orchestration approach represents a more incremental methodological advancement with a narrower scientific scope compared to the foundational XAI contributions of Paper 2.

vs. Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

gpt-5.25/20/2026

Paper 2 presents a concrete, technically novel framework with demonstrated empirical results: open-set concept discovery, causal interventions in black-box vision models, and dataset-level ontology induction. This combination is methodologically substantive and likely to be reusable across interpretability, robustness, and responsible AI, with clear real-world applicability to auditing deployed vision systems. Paper 1 is a vision/conceptual piece about trust in agent-to-agent networks; timely and potentially broad, but it appears less methodologically rigorous and offers fewer immediately testable contributions, which typically lowers near-term scientific impact.

vs. Prediction of Challenging Behaviors Associated with Profound Autism in a Classroom Setting Using Wearable Sensors

claude-opus-4.65/19/2026

Paper 2 addresses a critical real-world problem—predicting challenging behaviors in children with profound autism in actual classroom settings—with clear translational potential for safety and education. It bridges wearable sensing, foundation models, and special education in a novel real-world deployment context, moving beyond controlled lab settings. While Paper 1 makes solid contributions to XAI with causal concept explanations, the field is increasingly crowded. Paper 2's direct humanitarian impact, interdisciplinary reach (healthcare, education, ML, sensor technology), and practical applicability to an underserved population give it higher potential impact.

vs. GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction

gemini-3.15/19/2026

Paper 1 addresses a fundamental challenge in AI explainability (XAI) for black-box vision models using a novel causal framework and ontology induction. Its methodology for exposing systematic biases and latent relations has broad implications across multiple domains utilizing computer vision, such as healthcare and autonomous driving. In contrast, Paper 2 focuses on a more specialized, domain-specific application (cyber threat intelligence knowledge graphs), limiting its broader scientific impact compared to the fundamental AI transparency advances proposed in Paper 1.

vs. Skim: Speculative Execution for Fast and Efficient Web Agents

gemini-3.15/19/2026

Paper 1 addresses the critical, foundational problem of black-box model interpretability using causal concept explanation and ontology induction. Its theoretical depth and focus on uncovering causal relations and systematic biases offer broader implications for AI safety and trustworthy machine learning. While Paper 2 presents a highly practical systems-level optimization for web agents, Paper 1's contributions to explainable AI and causal inference are likely to spur a wider range of foundational follow-up research and have a more profound scientific impact.

vs. Harnessing LLM Agents with Skill Programs

gpt-5.25/19/2026

Paper 2 (HASP) likely has higher scientific impact due to its broad, timely relevance to LLM agent reliability and long-horizon task performance, with clear, reusable mechanism design (executable skill programs) and multiple integration pathways (inference-time, post-training, self-improvement). It reports substantial empirical gains across diverse benchmarks (web-search, math, coding), suggesting strong real-world applicability and cross-domain impact. Paper 1 is novel for black-box vision interpretability and ontology induction, but its applicability is more specialized to vision/XAI and may see narrower uptake compared to general agent frameworks.

vs. CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean

claude-opus-4.65/19/2026

OCCAM addresses the widely important problem of explainability in black-box vision models with a novel framework combining open-set concept discovery, causal interventions, and ontology induction. This has broad applicability across AI safety, trustworthy AI, and any domain using deep vision classifiers. While CAM-Bench fills a gap in formal theorem-proving benchmarks for applied mathematics, benchmarks tend to have more incremental impact unless they catalyze entirely new research directions. OCCAM's methodological contributions—causal concept explanations and structured ontology induction—offer richer potential for follow-on research and real-world deployment in high-stakes applications.

vs. ADR: An Agentic Detection System for Enterprise Agentic AI Security

gemini-3.15/19/2026

Paper 2 addresses the highly timely and critical challenge of Agentic AI security. Its scientific impact is amplified by massive, production-proven real-world validation at Uber, the introduction of a novel benchmark (ADR-Bench), and superior performance over state-of-the-art baselines. While Paper 1 offers valuable advances in vision model interpretability, Paper 2's intersection of cybersecurity, LLM agents, and systems engineering combined with its unprecedented empirical scale promises broader, more immediate adoption and influence across both academia and industry.

vs. Heterogeneous Information-Bottleneck Coordination Graphs for Multi-Agent Reinforcement Learning

gpt-5.25/19/2026

Paper 2 (OCCAM) likely has higher scientific impact due to broad, timely relevance: black-box interpretability, causal concept attribution, and ontology induction apply across many vision systems and safety/audit settings. Its open-set concept discovery plus intervention-based causal testing can be used by practitioners without model access, increasing real-world applicability. The induced global ontology extends beyond per-instance explanations, potentially influencing XAI, robustness, fairness, and model governance. Paper 1 is methodologically rigorous and novel for MARL coordination graphs, but its impact is narrower to cooperative MARL and may see slower adoption outside that community.

vs. Learning Developmental Scaffoldings to Guide Self-Organisation

gpt-5.25/19/2026

Paper 2 is more novel and broadly impactful: it formalizes and experimentally studies a fundamental developmental principle (information offloading to initial conditions) with a jointly learned NCA+SIREN model and information-theoretic analysis. The approach is timely for self-organising systems, ALife, and morphogenesis, and can transfer to robust generative modeling, distributed computation, and synthetic biology-inspired design. Paper 1 is valuable for XAI in vision, but concept discovery/interventions/ontology induction builds on a crowded XAI literature and is more domain-specific, likely yielding narrower cross-field impact.

vs. Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

gemini-3.15/19/2026

Paper 2 addresses a critical bottleneck in autonomous LLM agents by unifying skill selection, utilization, and distillation through reinforcement learning. This integrated approach to building persistent skill libraries offers immense potential for real-world automation and scalable agentic AI. Its broad applicability across interactive tasks and alignment with current trends in general-purpose AI give it a higher potential for widespread scientific and practical impact compared to the narrower focus on vision model interpretability in Paper 1.

vs. Enhancing Metacognitive AI: Knowledge-Graph Population with Graph-Theoretic LLM Enrichment

gpt-5.25/19/2026

Paper 1 likely has higher scientific impact: it introduces a novel open-set, black-box causal concept intervention framework plus dataset-level ontology induction for vision models, advancing both interpretability methodology and global model diagnostics (biases, dependencies). This is methodologically more rigorous and broadly useful for safety, debugging, and trustworthy deployment across many vision systems. Paper 2 is timely and application-driven, but largely integrates existing components (LLMs, web retrieval, Neo4j/GraphRAG, graph metrics) into a pipeline; evaluation is limited in scale and may be sensitive to tooling choices, reducing generalizable scientific novelty.

vs. Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning

gemini-3.15/19/2026

Paper 1 tackles a foundational and highly critical problem in AI: the causal interpretability of black-box models. By moving beyond local, per-image attributions to induce a global, causal concept ontology, OCCAM offers profound implications for AI safety, bias detection, and transparency. While Paper 2 provides a highly practical and effective engineering optimization for LLM coding agents (context pruning), Paper 1's contributions to mechanistic understanding and open-set causal reasoning represent a broader and more significant theoretical advancement with potential impact across multiple domains in deep learning.

vs. $π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

gpt-5.25/19/2026

Paper 2 (OCCAM) likely has higher impact due to broader cross-field relevance (vision, XAI, causality, ontology learning) and clearer real-world applicability for auditing black-box models for safety, bias, and accountability. Its methodology includes concept discovery, localization, and interventional testing to estimate causal contributions, plus dataset-level ontology induction—offering both local and global interpretability, which is timely amid regulatory and deployment pressures. Paper 1 is novel as a benchmark for proactive long-horizon assistants, but benchmarks with 100 tasks/personas may have narrower adoption and influence compared to a general-purpose interpretability framework for widely used vision classifiers.

vs. ECG-WM: A Physiology-Informed ECG World Model for Clinical Intervention Simulation

gemini-3.15/19/2026

While Paper 1 provides strong advances in explainable AI for computer vision, Paper 2 integrates physiological ODE priors with latent diffusion to create a predictive world model for ECGs. This has profound potential for real-world application in healthcare, enabling safe, action-conditioned clinical intervention simulations that can directly improve patient safety and personalized medicine.

vs. AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

gpt-5.25/19/2026

Paper 2 likely has higher impact due to strong timeliness and broad applicability to rapidly evolving T2I alignment. AutoRubric-T2I introduces an interpretable, data-efficient alternative to large BT-trained reward models, with clear real-world utility (cheaper alignment, easier adaptation) and demonstrated downstream RL gains on multiple benchmarks. Its rule-learning + sparse selection approach is methodologically straightforward yet scalable and transferable across models and prompts. Paper 1 is novel for black-box causal concept explanation and ontology induction, but its impact may be narrower (interpretability niche, heavier assumptions around interventions/segmentation) and less immediately deployable in production pipelines.

vs. Learning Quantifiable Visual Explanations Without Ground-Truth

gpt-5.25/19/2026

Paper 1 (OCCAM) is more novel by combining open-set concept discovery, text-guided segmentation, causal interventions, and dataset-level ontology induction, moving beyond per-image attributions to global, structured model understanding. This offers broader real-world utility (auditing, bias discovery, model debugging) and wider cross-field impact (vision, interpretability, causal analysis, knowledge/ontology learning). Paper 2 advances evaluation via a perturbation-based metric and a trainable explanation adapter, but metrics for XAI are a crowded area and the approach may be more incremental and narrower in scope.