OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models
Chiara Maria Russo, Simone Carnemolla, Simone Palazzo, Daniela Giordano, Concetto Spampinato, Matteo Pennisi
Abstract
Interpreting the decisions of deep image classifiers remains challenging, particularly in black-box settings where model internals are inaccessible. We introduce OCCAM, a framework for open-set causal concept explanation and ontology induction in vision models. OCCAM discovers visual concepts in an open-set manner, localizes them via text-guided segmentation, and performs object-level interventions by removing concepts to measure changes in class confidence, estimating each concept's causal contribution. Beyond local explanations, OCCAM aggregates interventional evidence across a dataset to induce a structured concept ontology that captures how classifiers globally organize visual concepts. Reasoning over this ontology reveals consistent dependencies between concepts, exposes latent causal relations, and uncovers systematic model biases. Experiments on Broden and ImageNet-S across multiple classifiers show that OCCAM improves explanation quality in open-set black-box settings while providing richer global insight than per-image attribution methods.
AI Impact Assessments
(1 models)Scientific Impact Assessment: OCCAM
1. Core Contribution
OCCAM introduces a framework for explaining black-box vision classifiers through three interconnected mechanisms: (1) open-set concept discovery via multimodal LLMs, (2) causal contribution estimation through input-level interventions (concept removal via inpainting), and (3) ontology induction by aggregating interventional evidence across a dataset into a structured knowledge graph. The key novelty lies in combining these three elements — open-set discovery, interventional causality, and global ontology construction — into a unified pipeline that operates under strict black-box constraints.
The problem addressed is genuine: most concept-based explanation methods require either white-box access, predefined concept vocabularies, or both. OCCAM removes both constraints simultaneously, which is a meaningful practical advance for deployed systems and API-accessible models.
2. Methodological Rigor
Strengths in methodology:
Weaknesses in methodology:
3. Potential Impact
The framework addresses a real need in deployed AI systems where model internals are inaccessible. The ontology induction component is conceptually appealing — moving from per-image explanations to a global, queryable knowledge structure could be valuable for model auditing, bias detection, and regulatory compliance.
The extension to multimodal classifiers (Table 4) demonstrates breadth, and the finding that SigLIP distributes reliance more broadly than CLIP variants is an interesting architectural insight.
However, practical impact may be limited by computational cost (running an MLLM + SAM3 + LaMa inpainting per concept per image is expensive) and by the reliance on the quality of foundation model components. The explanation quality is fundamentally bounded by the MLLM's ability to propose relevant concepts and SAM3's segmentation accuracy.
4. Timeliness & Relevance
The paper is timely in leveraging the current ecosystem of foundation models (LLMs, SAM, inpainting models) for explainability. The black-box constraint is increasingly relevant as models are deployed behind APIs. The ontology induction angle connects XAI to knowledge representation, which is an underexplored direction.
However, the dependence on multiple large foundation models (Gemma 27B, SAM3, LaMa) means the "explainer" is arguably more complex than many models being explained, raising questions about the explanation's own transparency and reliability.
5. Strengths & Limitations
Key Strengths:
Notable Limitations:
Additional Observations
The volleyball example (Fig. 2) where classifiers identify "hair" and "smiling faces" as important for volleyball classification is presented as revealing model biases — but this could also reflect limitations of the MLLM's concept proposals rather than genuine classifier behavior. Disentangling the two is not addressed.
The paper is from a single institution (University of Catania) with six authors, suggesting a focused effort. The writing is generally clear but occasionally conflates evaluation-only use of white-box methods with the black-box claims.
Generated May 19, 2026
Comparison History (20)
OCCAM addresses the broadly important problem of explainability in deep learning with a novel combination of open-set concept discovery, causal interventions, and ontology induction for black-box vision models. It has wider applicability across computer vision, XAI, and model auditing, with empirical validation on standard benchmarks. Paper 2 offers an elegant formalization of trust calibration as preference learning but is more niche, primarily theoretical, and lacks empirical validation. OCCAM's methodological contributions and breadth of impact give it higher potential scientific influence.
Paper 1 addresses the urgent and highly relevant issue of safety and vulnerabilities in emerging Large Reasoning Models (LRMs). Given the rapid deployment of reasoning LLMs, understanding and mitigating their jailbreak vulnerabilities is critical. The novel use of attention-guided reinforcement learning provides a rigorous approach to AI safety, likely leading to broader immediate impact and real-world application in securing foundational models compared to the more mature field of vision model explainability addressed in Paper 2.
Paper 1 addresses the highly timely intersection of LLMs and personalized healthcare, with a large-scale evaluation (2,257 queries, 1,945 PHRs) and both automated and clinician ratings. It introduces a novel evaluation framework for PHR-specific error modes, has direct real-world healthcare applications affecting millions of patients, and tackles the critical challenge of making complex health records actionable. Paper 2 contributes meaningfully to XAI with causal concept explanations, but explainability methods face a more crowded field. Paper 1's broader societal impact potential and timeliness in the LLM-health space give it the edge.
Paper 2 tackles a fundamental and pressing challenge in AI—interpreting black-box vision models via causal concept explanation and ontology induction. Its focus on open-set discovery and causal intervention offers broad implications for AI safety, trust, and explainability across multiple domains. In contrast, while Paper 1 presents a highly practical and commercially valuable application (NL2SQL), its multi-agent orchestration approach represents a more incremental methodological advancement with a narrower scientific scope compared to the foundational XAI contributions of Paper 2.
Paper 2 presents a concrete, technically novel framework with demonstrated empirical results: open-set concept discovery, causal interventions in black-box vision models, and dataset-level ontology induction. This combination is methodologically substantive and likely to be reusable across interpretability, robustness, and responsible AI, with clear real-world applicability to auditing deployed vision systems. Paper 1 is a vision/conceptual piece about trust in agent-to-agent networks; timely and potentially broad, but it appears less methodologically rigorous and offers fewer immediately testable contributions, which typically lowers near-term scientific impact.
Paper 2 addresses a critical real-world problem—predicting challenging behaviors in children with profound autism in actual classroom settings—with clear translational potential for safety and education. It bridges wearable sensing, foundation models, and special education in a novel real-world deployment context, moving beyond controlled lab settings. While Paper 1 makes solid contributions to XAI with causal concept explanations, the field is increasingly crowded. Paper 2's direct humanitarian impact, interdisciplinary reach (healthcare, education, ML, sensor technology), and practical applicability to an underserved population give it higher potential impact.
Paper 1 addresses a fundamental challenge in AI explainability (XAI) for black-box vision models using a novel causal framework and ontology induction. Its methodology for exposing systematic biases and latent relations has broad implications across multiple domains utilizing computer vision, such as healthcare and autonomous driving. In contrast, Paper 2 focuses on a more specialized, domain-specific application (cyber threat intelligence knowledge graphs), limiting its broader scientific impact compared to the fundamental AI transparency advances proposed in Paper 1.
Paper 1 addresses the critical, foundational problem of black-box model interpretability using causal concept explanation and ontology induction. Its theoretical depth and focus on uncovering causal relations and systematic biases offer broader implications for AI safety and trustworthy machine learning. While Paper 2 presents a highly practical systems-level optimization for web agents, Paper 1's contributions to explainable AI and causal inference are likely to spur a wider range of foundational follow-up research and have a more profound scientific impact.
Paper 2 (HASP) likely has higher scientific impact due to its broad, timely relevance to LLM agent reliability and long-horizon task performance, with clear, reusable mechanism design (executable skill programs) and multiple integration pathways (inference-time, post-training, self-improvement). It reports substantial empirical gains across diverse benchmarks (web-search, math, coding), suggesting strong real-world applicability and cross-domain impact. Paper 1 is novel for black-box vision interpretability and ontology induction, but its applicability is more specialized to vision/XAI and may see narrower uptake compared to general agent frameworks.
OCCAM addresses the widely important problem of explainability in black-box vision models with a novel framework combining open-set concept discovery, causal interventions, and ontology induction. This has broad applicability across AI safety, trustworthy AI, and any domain using deep vision classifiers. While CAM-Bench fills a gap in formal theorem-proving benchmarks for applied mathematics, benchmarks tend to have more incremental impact unless they catalyze entirely new research directions. OCCAM's methodological contributions—causal concept explanations and structured ontology induction—offer richer potential for follow-on research and real-world deployment in high-stakes applications.
Paper 2 addresses the highly timely and critical challenge of Agentic AI security. Its scientific impact is amplified by massive, production-proven real-world validation at Uber, the introduction of a novel benchmark (ADR-Bench), and superior performance over state-of-the-art baselines. While Paper 1 offers valuable advances in vision model interpretability, Paper 2's intersection of cybersecurity, LLM agents, and systems engineering combined with its unprecedented empirical scale promises broader, more immediate adoption and influence across both academia and industry.
Paper 2 (OCCAM) likely has higher scientific impact due to broad, timely relevance: black-box interpretability, causal concept attribution, and ontology induction apply across many vision systems and safety/audit settings. Its open-set concept discovery plus intervention-based causal testing can be used by practitioners without model access, increasing real-world applicability. The induced global ontology extends beyond per-instance explanations, potentially influencing XAI, robustness, fairness, and model governance. Paper 1 is methodologically rigorous and novel for MARL coordination graphs, but its impact is narrower to cooperative MARL and may see slower adoption outside that community.
Paper 2 is more novel and broadly impactful: it formalizes and experimentally studies a fundamental developmental principle (information offloading to initial conditions) with a jointly learned NCA+SIREN model and information-theoretic analysis. The approach is timely for self-organising systems, ALife, and morphogenesis, and can transfer to robust generative modeling, distributed computation, and synthetic biology-inspired design. Paper 1 is valuable for XAI in vision, but concept discovery/interventions/ontology induction builds on a crowded XAI literature and is more domain-specific, likely yielding narrower cross-field impact.
Paper 2 addresses a critical bottleneck in autonomous LLM agents by unifying skill selection, utilization, and distillation through reinforcement learning. This integrated approach to building persistent skill libraries offers immense potential for real-world automation and scalable agentic AI. Its broad applicability across interactive tasks and alignment with current trends in general-purpose AI give it a higher potential for widespread scientific and practical impact compared to the narrower focus on vision model interpretability in Paper 1.
Paper 1 likely has higher scientific impact: it introduces a novel open-set, black-box causal concept intervention framework plus dataset-level ontology induction for vision models, advancing both interpretability methodology and global model diagnostics (biases, dependencies). This is methodologically more rigorous and broadly useful for safety, debugging, and trustworthy deployment across many vision systems. Paper 2 is timely and application-driven, but largely integrates existing components (LLMs, web retrieval, Neo4j/GraphRAG, graph metrics) into a pipeline; evaluation is limited in scale and may be sensitive to tooling choices, reducing generalizable scientific novelty.
Paper 1 tackles a foundational and highly critical problem in AI: the causal interpretability of black-box models. By moving beyond local, per-image attributions to induce a global, causal concept ontology, OCCAM offers profound implications for AI safety, bias detection, and transparency. While Paper 2 provides a highly practical and effective engineering optimization for LLM coding agents (context pruning), Paper 1's contributions to mechanistic understanding and open-set causal reasoning represent a broader and more significant theoretical advancement with potential impact across multiple domains in deep learning.
Paper 2 (OCCAM) likely has higher impact due to broader cross-field relevance (vision, XAI, causality, ontology learning) and clearer real-world applicability for auditing black-box models for safety, bias, and accountability. Its methodology includes concept discovery, localization, and interventional testing to estimate causal contributions, plus dataset-level ontology induction—offering both local and global interpretability, which is timely amid regulatory and deployment pressures. Paper 1 is novel as a benchmark for proactive long-horizon assistants, but benchmarks with 100 tasks/personas may have narrower adoption and influence compared to a general-purpose interpretability framework for widely used vision classifiers.
While Paper 1 provides strong advances in explainable AI for computer vision, Paper 2 integrates physiological ODE priors with latent diffusion to create a predictive world model for ECGs. This has profound potential for real-world application in healthcare, enabling safe, action-conditioned clinical intervention simulations that can directly improve patient safety and personalized medicine.
Paper 2 likely has higher impact due to strong timeliness and broad applicability to rapidly evolving T2I alignment. AutoRubric-T2I introduces an interpretable, data-efficient alternative to large BT-trained reward models, with clear real-world utility (cheaper alignment, easier adaptation) and demonstrated downstream RL gains on multiple benchmarks. Its rule-learning + sparse selection approach is methodologically straightforward yet scalable and transferable across models and prompts. Paper 1 is novel for black-box causal concept explanation and ontology induction, but its impact may be narrower (interpretability niche, heavier assumptions around interventions/segmentation) and less immediately deployable in production pipelines.
Paper 1 (OCCAM) is more novel by combining open-set concept discovery, text-guided segmentation, causal interventions, and dataset-level ontology induction, moving beyond per-image attributions to global, structured model understanding. This offers broader real-world utility (auditing, bias discovery, model debugging) and wider cross-field impact (vision, interpretability, causal analysis, knowledge/ontology learning). Paper 2 advances evaluation via a perturbation-based metric and a trainable explanation adapter, but metrics for XAI are a crowded area and the approach may be more incremental and narrower in scope.