Leihan Zhang, Wecheng Ye, Xianlong Ma, Haochuan Liu, Yang Li, Qianyu Zhang, Jinliang Chen, Qiang Yan
As artificial intelligence (AI) systems are increasingly deployed across socially consequential domains, reports of AI-related harms and failures have grown in frequency and diversity. Although existing governance frameworks articulate high-level principles for responsible AI, large-scale empirical resources for tracking and analyzing real-world AI risk incidents remain limited. Existing incident collections are often manually curated, relatively small in scale, and insufficient for continuous, data-driven monitoring and downstream computational analysis. To address this need, we present RiskNet, a large-scale dataset of AI risk incidents constructed from large-scale multilingual news sources. RiskNet applies a structured pipeline for AI risk news identification, event-level report screening, incident alignment, and multi-dimensional incident classification. The resulting resource organizes dispersed news reports into incident-centered records and provides benchmark datasets for event classification, incident alignment, and incident-level risk labeling. In its current release, RiskNet covers hundreds of millions of source records and yields a large-scale collection of AI risk-related reports, including aligned incident clusters and annotated benchmark subsets. The dataset is also accessible through an online platform for browsing and exploration. We describe the data sources, processing workflow, taxonomy design, and technical validation of the resource. RiskNet is intended to support downstream research on AI safety, governance, risk analysis, and benchmarking, as well as longitudinal and cross-source analyses of AI-related harms. By providing a structured and reusable empirical resource, RiskNet helps bridge the gap between high-level governance principles and the documented realities of AI risk incidents.
RiskNet presents a large-scale, multilingual dataset of AI risk incidents constructed from news sources, accompanied by a structured pipeline for identification, alignment, and multi-dimensional classification of incidents. The core novelty lies in three areas: (1) the scale — processing hundreds of millions of source records to yield ~777K AI risk-related reports and ~265K event-level reports organized into ~54K incident clusters; (2) the incident alignment methodology that aggregates multiple news reports about the same real-world event into unified incident records using a dual-view retrieval and DeepWide pairwise classification approach; and (3) a multi-dimensional classification framework combining EU AI Act risk levels with MIT-derived domain taxonomies and causal tags. The paper addresses a genuine gap: existing AI incident repositories (AIID, AIAAIC) are manually curated, relatively small (~5K reports each), and lack automated cross-document incident linking.
The pipeline is well-structured and technically reasonable, though several concerns arise:
AI Safety and Governance: RiskNet could serve as a valuable empirical complement to high-level governance frameworks. The ability to systematically track incident trends, identify underrepresented risk domains, and conduct cross-lingual comparisons addresses real needs in the AI governance community.
NLP and Information Extraction: The benchmark subsets for event classification, incident alignment, and multi-label classification provide useful evaluation resources, though the benchmark sizes are relatively modest (2,000 for event classification, ~1,752 reference incidents for alignment, 2,285 for classification).
Practical limitations on impact: The full dataset is not publicly released due to licensing constraints — only benchmarks, code, and sample data are available, with full access requiring application through an online platform. This significantly limits reproducibility and community adoption. The paper's claim of being an "open dataset" is partially undermined by this access model.
The paper is highly timely. AI incident reporting has become a policy priority (OECD frameworks, EU AI Act reporting requirements), and the gap between governance principles and empirical tracking is widely recognized. The inclusion of multilingual sources, particularly Chinese-language news, addresses an important gap given the language bias documented in existing AI safety resources. The 2025-era coverage including deepfake fraud and agentic system incidents captures emerging risk categories.
RiskNet addresses a real and growing need for large-scale, structured AI incident data. The scale and multilingual coverage are impressive, and the incident alignment framework is a meaningful technical contribution. However, the restricted data availability, modest classification accuracy for dataset-wide labeling, and absence of demonstrated downstream utility temper the potential impact. The work is more of an infrastructure contribution than a scientific breakthrough, and its ultimate value will depend on community adoption and the quality of research it enables.
Generated Jun 9, 2026
Paper 1 offers a foundational, large-scale dataset addressing the critical and timely issue of AI safety and governance. While Paper 2 provides a highly valuable technical optimization for LLM inference, datasets like RiskNet typically generate broader, long-lasting cross-disciplinary impact by establishing new benchmarks and enabling extensive downstream research across AI development, policy-making, and societal studies.
RiskNet addresses a critical infrastructure gap in AI governance by providing a large-scale, structured empirical dataset for tracking real-world AI risk incidents. Its breadth of impact spans AI safety, governance, policy, and multiple research communities. The resource nature of the contribution means it enables numerous downstream studies. Paper 2, while technically sound with a novel framework for federated graph learning with missing modalities, addresses a narrower technical problem. RiskNet's timeliness amid growing AI deployment concerns and its potential to inform policy decisions give it broader and higher estimated impact.
Paper 1 exposes a critical gap between theoretical differential privacy guarantees and empirical vulnerabilities in LLM adaptation, specifically regarding distribution shifts between pretraining and fine-tuning data. This provides highly actionable, technical insights for deploying LLMs in sensitive domains. While Paper 2 offers a valuable dataset for AI governance, Paper 1 directly impacts core machine learning methodologies and security practices, likely driving immediate changes in how practitioners and researchers approach privacy-preserving LLM fine-tuning.
Paper 1 introduces a novel architectural contribution (GraMO) that combines state-space models with graph-based learning in a principled way, addressing fundamental challenges in modeling interacting dynamical systems. It demonstrates strong empirical results across multiple benchmarks. While Paper 2 (RiskNet) provides a valuable dataset for AI governance research, dataset papers typically have narrower methodological impact. Paper 1's technical innovation in coupling spatial-temporal dynamics within a single recurrence has broader applicability across physics simulation, robotics, and scientific computing, and advances the methodological frontier more significantly.
Paper 1 introduces a novel, broadly applicable training framework that addresses a key limitation of RL with verifiable rewards (zero group-level advantage) via trace tournaments and efficient Bradley–Terry ranking, yielding demonstrated gains in reasoning benchmarks and compute savings—likely to influence future LLM training methods across domains. Paper 2 provides a valuable dataset for AI governance and risk analysis, but its scientific impact hinges more on adoption/maintenance and may be narrower methodologically. Overall, Paper 1 is more technically innovative, timely for LLM training, and likely to propagate across multiple research areas.
Paper 1 likely has higher scientific impact due to broader cross-field relevance (AI safety, governance, policy, NLP/IR, incident analysis), strong timeliness, and high leverage as a large-scale, reusable dataset/platform that can enable many downstream studies and benchmarks. Its real-world applicability is immediate for monitoring and evaluating AI harms. Paper 2 is innovative and rigorous for aerial manipulation and sim-to-real meta-RL, but its impact is narrower (robotics/UAVs) and depends more on adoption and reproducibility in specific hardware settings.
Paper 2 offers higher scientific impact due to its novel contribution to mechanistic interpretability of LLMs, specifically causally localizing temporal preference representations—a previously unexplored area. It combines multiple rigorous methods (gradient attribution, activation patching, steering vectors) to provide actionable insights for AI alignment and control. The finding that LLMs discount the future differently than humans has broad implications for AI safety and deployment in decision-making. Paper 1, while useful as a dataset resource, is primarily an infrastructure contribution with incremental novelty over existing incident databases, and its impact depends on downstream adoption.
Paper 1 introduces a fundamental algorithmic breakthrough in reinforcement learning, unifying model-free and model-based methods. Its rigorous theoretical proofs, error bounds, and extensive empirical validation across 80 diverse environments demonstrate exceptional methodological rigor. While Paper 2 provides a timely and valuable dataset for AI governance, Paper 1's potential to advance core AI capabilities and inspire follow-up algorithmic research gives it a higher estimated scientific impact.
Paper 2 has higher potential scientific impact due to a novel methodological contribution (amortizing entropy-search acquisition via PFNs/in-context learning) that directly improves a widely used core tool (Bayesian optimization) with large practical gains (50x speedups) and open-source code. Its approach is broadly applicable across ML, AutoML, experimental design, robotics, and hyperparameter tuning, making cross-field uptake likely. Paper 1 provides a valuable dataset/platform for AI risk analysis, but its impact depends on sustained curation, coverage, and adoption, and is more domain-specific and sensitive to data/annotation biases.
GraphDETR introduces a novel deep learning framework that reformulates a fundamental NP-complete graph theory problem (subgraph isomorphism) as a set prediction task, drawing an innovative analogy to object detection (DETR). It offers both exact and approximate matching capabilities with strong empirical results on molecular functional group detection. This methodological innovation has broad applicability across scientific domains (chemistry, biology, network analysis). While RiskNet is a valuable dataset contribution for AI governance, it is primarily a curation/annotation effort with narrower scope; GraphDETR's algorithmic novelty and cross-domain applicability suggest higher long-term scientific impact.