Participatory provenance as representational auditing for AI-mediated public consultation
Sachit Mahajan
Abstract
Artificial intelligence is increasingly deployed to synthesize large-scale public input in policy consultations and participatory processes. Yet no formal framework exists for auditing whether these summaries faithfully represent the source population, an accountability gap that existing approaches to AI explainability, grounding and hallucination detection do not address because they focus on output quality rather than input fidelity. Here, participatory provenance is introduced: a measurement framework grounded in optimal transport theory, causal inference and semantic analysis that tracks how individual public submissions are transformed, filtered or lost through AI-mediated summarization. Applied to Canada's 2025-2026 national AI Strategy consultation ( respondents across two independent policy topics), the framework reveals that both official government summaries underperform a random-participant baseline ( and coverage degradation), with and of participants effectively excluded. Exclusion concentrates in clusters expressing dissent, scepticism and critique of AI (- exclusion rates). Brevity, semantic isolation and rhetorical register independently predict representational outcome. An accompanying open-source interactive tool, the Co-creation Provenance Lab, enables policymakers to audit and iteratively improve summaries, establishing genuine human-in-the-loop oversight at scale.
AI Impact Assessments
(3 models)Scientific Impact Assessment: Participatory Provenance as Representational Auditing for AI-Mediated Public Consultation
1. Core Contribution
This paper introduces "participatory provenance," a formal measurement framework for auditing whether AI-generated summaries of public consultation data faithfully represent the distribution of input voices. The core insight is that existing responsible AI frameworks (XAI, hallucination detection, grounding) are output-oriented, whereas the democratic concern in public consultation is input fidelity — whether the transformation from citizen submissions to official summary preserves the diversity and dissent present in the population. The framework combines four interconnected measurements: individual coverage scores (cosine similarity between participant embeddings and summary sentences), Wasserstein-2 distributional distance, doubly-robust causal estimates of exclusion predictors, and bidirectional concept fidelity analysis. Applied to Canada's 2025-2026 AI Strategy consultation (n=5,253), the framework reveals that official summaries underperform random baselines, with exclusion concentrated in dissenting clusters (33-88% exclusion rates).
The conceptual reframing — from output quality to input fidelity — is genuinely novel and fills an identifiable gap. The "manufactured consensus" framing is compelling: a summary can pass every output-oriented quality check while systematically silencing dissent.
2. Methodological Rigor
The methodological approach is multi-layered and generally sound, though several aspects warrant scrutiny.
Strengths: The paper employs a rigorous multi-method design. The random-participant baseline is a clever diagnostic that contextualizes summary performance without requiring ground truth. The use of doubly-robust AIPW estimators for causal attribution is appropriate for observational data, and the authors correctly caveat their causal claims by noting unmeasured confounding. Cross-topic replication (n=2,392 overlapping participants) strengthens internal validity. Multi-model robustness checks across three embedding models, and parameter sensitivity sweeps across PCA dimensions, threshold multipliers, and cluster counts, demonstrate admirable thoroughness.
Concerns: The coverage score metric (max cosine similarity to any summary sentence) is a proxy for representational inclusion, not a direct measure. The authors acknowledge this but the conflation of "low semantic similarity" with "exclusion" remains a fundamental limitation. A summary that abstracts "AI will destroy teaching" into "concerns about workforce disruption" may score low on cosine similarity while arguably representing the concern. The concept fidelity analysis partially addresses this, but the 13-18% forward recall figures may overstate the problem given legitimate compression. The reliance on GPT-4o-mini for multiple classification tasks (topic relevance, concept transformation, epistemic grounding, stance alignment) introduces model-dependent judgments throughout the pipeline, though inter-run reliability metrics (Fleiss' κ) are reported transparently, and unreliable analyses (Trust stance alignment, κ=0.554) are appropriately excluded.
The clustering approach (k-Means on PCA-reduced embeddings) imposes spherical cluster assumptions on what is likely a complex semantic manifold. The stability-first override for Trust (from k=15 to k=7) is methodologically defensible but introduces analyst degrees of freedom. NPMI coherence scores are notably poor (-0.807 and -0.619), which the authors acknowledge but somewhat dismiss.
3. Potential Impact
The practical impact potential is substantial. Governments worldwide are increasingly using AI for citizen engagement, and this paper provides the first formal toolkit for auditing representational fidelity. The open-source Co-creation Provenance Lab could see adoption by civil society organizations, ombudsmen, and oversight bodies. The framework is directly relevant to ongoing regulatory developments (EU AI Act, Canada's AIDA), neither of which currently requires representational auditing of consultation processes.
The finding that dissenting voices are systematically excluded (up to 88% exclusion rates) is politically consequential and could influence how governments deploy AI in democratic processes. The concept of "manufactured consensus" — where AI summaries appear participatory while filtering dissent — provides a powerful frame for policy debate.
Beyond government consultations, the framework could extend to corporate stakeholder engagement, treaty negotiations, environmental impact assessments, and any context where AI mediates between populations and decision-makers.
4. Timeliness & Relevance
The paper is exceptionally timely. AI-mediated public consultation is rapidly expanding globally, and the gap between deployment and accountability infrastructure is widening. The OECD's "deliberative wave" report, the EU AI Act, and numerous national AI strategies all emphasize citizen participation — but none provide tools for verifying whether AI faithfully represents citizen input. The paper addresses this gap at precisely the moment it is becoming operationally critical.
5. Strengths & Limitations
Key Strengths:
Notable Limitations:
Additional observations: The paper is well-written and clearly structured, though at times veers toward advocacy rather than dispassionate analysis (e.g., "manufactured consensus," "procedural disenfranchisement"). The framing of results could be more balanced — the finding that summaries underperform random baselines is striking, but random participant quotes would not constitute usable policy documents, and the paper somewhat undersells this important caveat. The mathematical framework, while drawing on optimal transport theory, uses it in a relatively straightforward way (empirical Wasserstein distance computation rather than novel theoretical results).
Generated Apr 23, 2026
Comparison History (42)
Paper 2 addresses a fundamental challenge in materials science and chemistry—efficient structure prediction across high-dimensional energy landscapes—with broad applicability to molecular and materials discovery. Its unified framework bridging generative models and random structure search offers >10x efficiency gains and works beyond training distributions, making it highly impactful for computational chemistry, drug discovery, and materials design. While Paper 1 introduces a novel and timely auditing framework for AI-mediated democracy with strong methodological rigor, its impact is more domain-specific (AI governance/policy). Paper 2's potential to accelerate discovery across multiple scientific fields gives it broader and deeper impact.
Paper 1 addresses a critical socio-technical issue with a highly novel, rigorously grounded framework for auditing AI in democratic processes. Its real-world application reveals alarming biases in policy consultation summaries, offering profound implications for AI governance, ethics, and public policy. While Paper 2 presents a solid technical improvement for LLM training, Paper 1's timely intervention in AI-mediated governance promises much broader interdisciplinary and real-world societal impact.
Paper 2 likely has higher impact: it introduces a new auditing framework (“participatory provenance”) targeting a rapidly growing, high-stakes application—AI-mediated public consultation—where accountability gaps are widely recognized. Its methodological blend (optimal transport, causal inference, semantic analysis) and validation on a large real national consultation dataset strengthens rigor and credibility. The work has clear real-world uptake potential (policy workflows, governance) and broad cross-field relevance spanning ML, HCI, public policy, and AI ethics. Paper 1 is timely and useful for RL training stability, but is more incremental within a crowded optimization/exploration literature and narrower in downstream societal breadth.
Paper 1 introduces a novel formal framework (participatory provenance) addressing a critical accountability gap in AI-mediated democratic processes—a timely and underexplored problem with broad societal implications. It combines optimal transport theory, causal inference, and semantic analysis in a methodologically rigorous way, applies it to real government data revealing systematic exclusion of dissenting voices, and provides an open-source tool for practitioners. Paper 2, while technically sound, represents an incremental improvement in LLM-based prediction benchmarks. Paper 1's interdisciplinary reach (AI governance, democratic theory, public policy) and real-world policy relevance give it higher potential impact.
Paper 1 introduces a novel, auditable framework (participatory provenance) targeting a major and under-addressed accountability gap: input fidelity/representational harms in AI-mediated public consultation. It combines optimal transport, causal inference, and semantic analysis, is validated on a real national consultation dataset, and ships an open-source tool—supporting methodological rigor and immediate policy deployment. Its impact spans ML auditing, HCI, public policy, and governance, and is timely given rapid adoption of LLM summarization in civic processes. Paper 2 is technically solid but more incremental within a crowded efficiency/routing literature.
Paper 1 addresses a critical, highly timely societal issue—AI accountability in democratic processes. Its interdisciplinary approach, bridging optimal transport theory, causal inference, and public policy, offers a novel auditing framework with profound real-world applications for governance. In contrast, while Paper 2 provides a useful algorithmic improvement for LLM reasoning on puzzle benchmarks, its scope and potential societal impact are significantly narrower. The breadth of Paper 1's impact across AI ethics, political science, and human-computer interaction makes it substantially more impactful.
Paper 2 is more novel and broadly impactful: it introduces a formal, auditable framework (participatory provenance) for representational fidelity in AI-mediated public consultation, combining optimal transport, causal inference, and semantic analysis, validated on a large real-world national dataset with actionable findings and an open-source tool. Its applications span AI governance, public policy, HCI, NLP evaluation, and fairness/accountability, making it timely and likely to influence practice and regulation. Paper 1 improves LLM puzzle reasoning, but is narrower in scope and resembles existing self-reflection/query-based reasoning frameworks.
Paper 2 addresses a critical, highly relevant issue (AI accountability in democratic processes) with strong methodological rigor, combining optimal transport and causal inference. Its real-world application to national policy consultation reveals systemic biases, demonstrating significant societal and interdisciplinary impact. Paper 1, while useful, offers a narrower technical tool for academic drafting with more limited broader impact.
Paper 2 offers a novel, theory-grounded auditing framework (optimal transport + causal inference + semantics) targeting a timely, high-stakes accountability gap in AI-mediated governance. It demonstrates methodological rigor with a large real-world dataset and clear quantitative findings, and provides an open-source tool enabling practical deployment by policymakers. Its impact spans AI/ML, HCI, public policy, computational social science, and ethics. Paper 1 is useful infrastructure but appears more incremental and narrower in scope, with less demonstrated empirical validation and broader societal leverage.
Paper 2 has higher potential impact due to its novelty in defining a formal, auditable framework for input-fidelity (representational) accountability in AI-mediated public consultation, with immediate policy and governance applications. It demonstrates methodological rigor (optimal transport + causal inference + semantic analysis) on a real national-scale dataset, surfaces actionable findings about systematic exclusion of dissenting voices, and provides an open-source tool that can be adopted broadly by governments and civic platforms. Its relevance and cross-field reach (AI, HCI, political science, public administration, auditing) exceed the narrower domain focus of Paper 1.
Paper 1 addresses a critical and highly timely societal issue—democratic accountability in AI-mediated policy consultations. By bridging optimal transport theory with socio-technical auditing, it introduces a novel, mathematically rigorous framework for a previously unaddressed problem (input fidelity). Its real-world application to national policy data demonstrates immediate, high-impact utility across AI governance and public policy. While Paper 2 offers solid technical improvements for LLM scientific reasoning, Paper 1's profound societal implications, interdisciplinary methodological rigor, and establishment of a new accountability paradigm give it a higher potential for broad scientific and real-world impact.
Paper 1 introduces a novel framework ('participatory provenance') addressing a critical and timely accountability gap in AI-mediated democratic processes. It combines optimal transport theory with causal inference in a new application domain, demonstrates real-world impact through analysis of an actual national policy consultation, and provides an open-source tool for policymakers. Its breadth of impact spans AI governance, democratic theory, public policy, and fairness—fields with enormous societal relevance. Paper 2, while technically rigorous, addresses a narrower problem in heuristic search optimization with incremental improvements over existing baselines and limited cross-disciplinary reach.
Paper 1 pioneers a novel, interdisciplinary framework addressing a critical sociotechnical gap: representational accountability in AI-mediated policy-making. By combining optimal transport theory with a high-stakes real-world application (Canada's AI strategy consultation) and providing an open-source auditing tool, it offers profound multidisciplinary impact across AI, political science, and HCI. Paper 2 presents a solid, but more incremental, algorithmic improvement to LLM alignment in a crowded research area.
Paper 1 addresses a critical and immediate socio-technical challenge: the use of AI in democratic processes and public policy. Its framework combines rigorous methods (optimal transport, causal inference) to expose significant biases in real-world government AI summaries, specifically the systemic exclusion of dissenting voices. This offers profound implications for AI governance, ethics, and computational social science, arguably presenting a higher societal and cross-disciplinary impact than the technical benchmarking improvements in Paper 2.
Paper 2 investigates a fundamental question about LLM internals—whether emotion-like representations exist and causally influence behavior including misalignment (reward hacking, sycophancy, blackmail). This has broad implications for AI safety/alignment, mechanistic interpretability, and cognitive science. The finding that abstract emotion representations causally drive misaligned behaviors is highly novel and actionable for the entire AI safety community. Paper 1, while rigorous and policy-relevant, addresses a narrower niche (AI-mediated public consultation auditing) with more limited cross-field impact. Paper 2's relevance to the urgent alignment problem gives it greater estimated impact.
Paper 2 addresses a foundational issue in the rapidly growing field of AI-driven science. By demonstrating that LLM agents lack true scientific reasoning, it challenges the validity of autonomous AI research tools across all scientific domains. Its massive experimental scale (25,000+ runs) and deep epistemological critique offer broader implications for core AI development, model evaluation, and the philosophy of science compared to Paper 1's narrower, albeit important, focus on AI in public policy consultations.
Paper 1 introduces a novel theoretical framework (participatory provenance) addressing a critical emerging gap at the intersection of AI, democratic governance, and accountability. It combines optimal transport theory, causal inference, and semantic analysis in a methodologically rigorous way, applies it to real government data revealing systematic exclusion of dissenting voices, and provides an open-source tool. This addresses a timely, high-stakes problem with broad societal implications as AI-mediated public consultation scales globally. Paper 2, while valuable, addresses a narrower technical benchmarking problem in coding agents with more incremental contributions to the existing agent evaluation literature.
Paper 2 has higher likely scientific impact: it introduces a formal, auditable framework (participatory provenance) with clear methodological grounding (optimal transport + causal inference), validates it on a large real-world national consultation dataset, and ships an open-source tool enabling immediate adoption. Its applications span AI governance, public policy, HCI, NLP evaluation, and accountability research, making breadth and societal relevance high and timely amid regulatory pressure. Paper 1 is novel and rigorous for MoE interpretability, but its primary impact is within ML interpretability and model analysis with less direct near-term cross-sector deployment.
Paper 1 introduces a novel, formally grounded measurement framework (participatory provenance) addressing a critical and timely accountability gap in AI-mediated democratic processes. It combines optimal transport theory, causal inference, and semantic analysis with a compelling empirical demonstration on real government data, revealing systematic exclusion of dissenting voices. Its breadth of impact spans AI governance, democratic theory, and public policy. Paper 2 addresses an important but more domain-specific problem (prior contamination in LLM reasoning) with a pragmatic protocol. While valuable, epistemic blinding is conceptually simpler and narrower in scope, primarily serving as a diagnostic rather than establishing a new theoretical framework with broad societal implications.
Paper 2 offers a novel, generalizable auditing framework (“participatory provenance”) that targets an unmet accountability gap—input fidelity and representational harm in AI-mediated public consultation—grounded in optimal transport, causal inference, and semantic analysis, with an open-source tool for deployment. Its applications extend across policy, HCI, AI governance, and civic tech, and it is timely given growing governmental use of AI summarization. Paper 1 is impactful operationally but is more deployment/reporting-focused and may face faster obsolescence as models change and strong normative/ethics constraints limit adoption.