Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

Gianluca Guidi, Francesca Dominici, Tiziano Squartini, Callaway Sprinkle, Jonathan Gilmour, Kevin Butler, Eric Bell, Scott Delaney

Jun 3, 2026

arXiv:2606.05420v1 PDF

cs.AI(primary)stat.AP

#821of 3355·Artificial Intelligence

#821 of 3355 · Artificial Intelligence

Tournament Score

1457±48

10501800

75%

Win Rate

Wins

Losses

Matches

Rating

6.8/ 10

Significance7.5

Rigor6.5

Novelty5.5

Clarity7.5

Tournament Score

1457±48

10501800

75%

Win Rate

Wins

Losses

Matches

Rating

6.8/ 10

Significance

Rigor

Novelty

Clarity

Abstract

The rapid proliferation of hyperscale data centers (HDCs) in the US, mainly driven by the adoption of artificial intelligence, has raised concerns about this industry's environmental footprint. We compiled facility-level information on 403 US hyperscale data centers operating between May 2024 and April 2025 and estimated their electricity consumption, electricity sources, and attributable CO2 emissions. Across different facility-load scenarios, these HDCs consumed approximately 68-99 TWh of electricity and were associated with about 37-54 million metric tons of CO2. Under the central scenario, HDC electricity demand corresponded to approximately 1.8% of total US electricity consumption, with roughly 54% of attributed generation supplied by fossil-fuel sources. The HDC electricity-weighted average carbon intensity was approximately 545 gCO2/kWh, about 48% above the contemporaneous US national grid-average carbon intensity of 370 gCO2/kWh. Our approach provides an attributional tool for assessing the environmental footprint of hyperscale data centers using the most recent EPA eGRID plant-level data.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

1. Core Contribution

This paper provides a bottom-up, facility-level assessment of the electricity consumption and attributable CO₂ emissions of 403 US hyperscale data centers (HDCs) during May 2024–April 2025. The central finding is that these HDCs consumed approximately 82 TWh of electricity (central scenario) and were responsible for roughly 45 Mt CO₂ — representing ~1.8% of total US electricity consumption with a weighted carbon intensity of ~545 gCO₂/kWh, approximately 48% above the national grid average. The main novelty lies in the integration pipeline: harmonizing commercial facility data (Baxtel), satellite imagery validation via OpenStreetMap, EPA eGRID2023 plant-level emissions data, and balancing-authority-level attribution — applied to a substantially larger and more current dataset than prior work (e.g., Siddik et al. 2021, which covered 2018 and found 10.5 Mt CO₂).

2. Methodological Rigor

Strengths in methodology:

The multi-source data pipeline (commercial data → geocoding → OSM footprint matching → satellite imagery validation) is well-documented and represents a credible approach to the fundamental challenge of HDC opacity.

The scenario-based approach (u ∈ {0.48, 0.58, 0.663, 0.70}) transparently brackets uncertainty in converting nameplate capacity to operational load, rather than presenting a single point estimate.

The central scenario is independently calibrated against both Newkirk et al.'s power-flow modeling and a Lei-Masanet PUE-implied check, yielding consistent values (~0.58-0.59).

Sensitivity analysis to eGRID vintage (2022 vs 2023) shows only ~3% shift, adding confidence.

Methodological limitations:

The attributional framework assumes proportional allocation of generation within balancing authorities — a standard but simplistic approach that ignores temporal load-generation correlation, merit-order dispatch effects, and transmission constraints. The authors acknowledge this but do not quantify the resulting bias.

The GBRT model for imputing 6 missing power capacities is disproportionately documented relative to its influence (only 6/403 facilities). The grouped cross-validation results (R² = 0.279 for leave-one-climate-out) reveal limited generalizability, though the practical impact is negligible given the small imputation set.

The one-year lag between the facility observation window (May 2024–April 2025) and the eGRID2023 grid data (calendar year 2023) introduces temporal mismatch. Given rapid grid evolution, this is a real concern.

Behind-the-meter generation and PPAs are excluded, which could meaningfully affect attribution for operators like Google, Microsoft, and Meta who are major renewable energy purchasers.

Coverage is incomplete: only 403 of 675 initially identified HDCs could be validated, and the relationship to the true national HDC population is uncertain.

3. Potential Impact

The paper addresses a topic of high public and policy relevance. Key impact pathways include:

Policy informing: The finding that HDC-weighted carbon intensity exceeds the national average by 48% directly challenges narratives about tech companies' clean energy commitments and could inform state-level permitting and reporting requirements.

Benchmarking: The facility-level dataset and methodology provide a replicable framework for ongoing monitoring, particularly valuable as AI-driven electricity demand accelerates.

Public accountability: The web-based visualization tool (ArcGIS dashboard) enables stakeholders to explore regional HDC footprints, though the underlying facility data remain restricted by DUA.

Cross-sectoral relevance: The geographic concentration finding (>50% of HDC electricity in just four states) has implications for grid planning, renewable energy siting, and environmental justice.

However, the practical impact is somewhat constrained by the attributional (vs. consequential) framework, which limits the paper's utility for informing siting decisions or policy interventions targeting marginal emissions.

4. Timeliness & Relevance

This paper is exceptionally timely. The explosive growth of AI workloads has made data center energy consumption a first-order policy concern, yet peer-reviewed evidence has lagged far behind industry reports and journalistic coverage. The most-cited academic estimate (Siddik et al. 2021) used 2018 data — a lifetime ago in AI infrastructure terms. By providing 2024-2025 estimates using a validated methodology, this paper fills a critical gap. The 3.5–5x increase in HDC emissions relative to 2018 quantifies the scale of the problem in a way that prior projections (e.g., Xiao et al.'s forward-looking estimates) could not.

5. Strengths & Limitations

Key strengths:

Largest validated facility-level HDC dataset in the peer-reviewed literature

Transparent scenario framework with independent calibration

Reproducible pipeline with open-source code (though data remain restricted)

Timely contribution to an urgent policy debate

Geographic granularity (balancing authority and state level) enables actionable insights

Notable weaknesses:

Data access restrictions limit full reproducibility despite code availability

Attributional framework ignores temporal dynamics, PPA effects, and marginal emissions — all critical for policy decisions

Cannot differentiate AI vs. general-purpose workloads, limiting specificity of AI-related conclusions

The paper's framing occasionally conflates the study's attributional scope with causal claims about HDC environmental impact

Comparison to IEA benchmarks suggests 56–82% coverage, but this comparison itself carries compounded uncertainty

The facility-load coefficient, while carefully bounded, remains the dominant source of uncertainty and is essentially unknowable without operator disclosure

Additional Observations

The paper is well-written and clearly structured, with extensive supplementary materials. The interdisciplinary team (biostatistics, environmental health, computer science, GIS) brings appropriate expertise. However, the paper reads more as a carefully constructed accounting exercise than a methodological advance — the attribution framework is standard, and the ML component is minor. The primary contribution is the data compilation and validation effort, which, while valuable, is inherently perishable given the rapid pace of HDC construction.

The restriction on facility-level data release is understandable but significantly limits the paper's scientific utility for downstream research. The synthetic datasets provided are a reasonable compromise but cannot substitute for real facility-level analysis.

Rating:6.8/ 10

Significance 7.5Rigor 6.5Novelty 5.5Clarity 7.5

Generated Jun 5, 2026

Comparison History (20)

vs. Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline

gpt-5.26/6/2026

Paper 2 has higher likely scientific impact due to strong real-world relevance and timeliness (AI-driven data-center growth), clear policy and industry applications (energy planning, emissions accounting, regulation), and broad cross-field reach (energy systems, climate science, CS/AI infrastructure, economics). Its facility-level dataset and transparent attributional methodology using current EPA eGRID data support rigor and reuse. Paper 1 is novel within LLM-agent memory evaluation and may influence agent design, but its impact is narrower and more contingent on rapid shifts in LLM tooling and benchmarks.

vs. EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts

claude-opus-4.66/6/2026

Paper 2 addresses a critically timely and broadly relevant topic—the environmental footprint of hyperscale data centers driven by AI growth—with novel facility-level empirical data (403 HDCs) that fills a major knowledge gap. Its findings (HDC carbon intensity 48% above grid average) have immediate policy implications and broad interdisciplinary impact across energy, environmental science, computer science, and policy. Paper 1, while technically sound, addresses a narrower ML/epidemiology niche with incremental improvements to LLM-based forecasting. Paper 2's empirical contribution and societal relevance give it substantially broader citation potential and impact.

vs. MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction

gpt-5.26/6/2026

Paper 2 has higher potential impact: it provides a timely, facility-level, policy-relevant quantification of U.S. hyperscale data center electricity use and CO2 emissions using recent eGRID data, enabling benchmarking and informing regulation, grid planning, and corporate decarbonization across energy, climate, and computing. Its dataset and attributional tool can be widely reused and updated. Paper 1 is technically innovative for real-time trajectory prediction and valuable for autonomy, but its impact is more domain-specific and depends on reproducibility and adoption in deployed stacks.

vs. GITCO: Gated Inference-Time Context Optimization in TSFMs

claude-opus-4.66/6/2026

Paper 1 addresses a critically timely and high-visibility topic—the environmental footprint of AI-driven hyperscale data centers—with novel facility-level empirical data covering 403 US data centers. Its finding that HDC carbon intensity is 48% above the national grid average has immediate policy relevance and broad societal impact across energy, environmental, and technology domains. Paper 2, while technically sound, presents an incremental improvement (~1.95% MASE reduction) to a specific time series forecasting framework, limiting its breadth of impact to a narrower ML audience.

vs. Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

gemini-3.16/5/2026

While Paper 1 provides a highly relevant empirical analysis of AI's environmental impact, Paper 2 offers a concrete, novel methodological solution to this challenge by drastically reducing the computational and memory requirements of LLM deployment. The proposed quantization framework achieves remarkable performance gains at ultra-low bitrates, which will directly drive widespread adoption in AI research and industry, leading to higher technological impact and citations.

vs. Risk Assessment of Autonomous Driving: Integrating Technical Failures, Ethical Dilemmas, and Policy Frameworks

gpt-5.26/5/2026

Paper 2 has higher likely scientific impact due to its timely, quantitative, facility-level assessment of AI-driven hyperscale data center electricity use and CO2 emissions, producing actionable national-scale estimates and a replicable attributional methodology using current eGRID data. Its results directly inform energy policy, grid planning, corporate reporting, and climate research, with broad cross-field relevance (computing, power systems, environmental science, economics). Paper 1 is integrative and relevant but relies largely on secondary datasets and yields more qualitative, less methodologically novel conclusions, limiting incremental impact.

vs. TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models

claude-opus-4.66/5/2026

Paper 2 addresses a highly timely and socially significant topic—the environmental impact of hyperscale data centers driven by AI growth. It provides the first comprehensive facility-level assessment of 403 US hyperscale data centers, offering concrete empirical data (68-99 TWh consumption, carbon intensity 48% above grid average) that will be widely cited in policy, sustainability, and computing research. Its broad interdisciplinary relevance spanning energy policy, environmental science, and computer science, combined with immediate real-world policy implications, gives it higher potential impact than Paper 1's more incremental technical contribution to multimodal time series modeling.

vs. Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

gemini-3.16/5/2026

Paper 2 addresses a highly urgent and globally relevant issue: the environmental impact and energy consumption of hyperscale data centers driven by the AI boom. Its findings have broad implications for climate policy, energy grid planning, and the tech industry. While Paper 1 presents an interesting AI/robotics framework for spatial reasoning, Paper 2's potential to influence real-world sustainability efforts, policy decisions, and cross-disciplinary research gives it a higher scientific and societal impact.

vs. GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection

gemini-3.16/5/2026

Paper 2 addresses a critical, globally relevant issue—the environmental footprint of AI infrastructure. Its empirical assessment of hyperscale data centers has broad implications across environmental science, energy policy, and the tech industry, offering foundational data for future sustainability efforts. Paper 1 presents a practical but relatively incremental solution for LLM security, which, while useful, is narrower in scope and cross-disciplinary impact compared to the macro-level climate and energy insights provided by Paper 2.

vs. Entropy-Based Evaluation of AI Agents: A Lightweight Framework for Measuring Behavioral Patterns

claude-opus-4.66/5/2026

Paper 2 addresses a timely, high-stakes environmental question with rigorous empirical methodology—compiling facility-level data on 403 US hyperscale data centers and quantifying their carbon footprint. Its finding that HDC carbon intensity is 48% above the national grid average is a striking, policy-relevant result with broad implications for AI sustainability, energy policy, and climate discussions. Paper 1 proposes a conceptual framework for entropy-based agent evaluation that, while interesting, lacks empirical validation and offers incremental contributions to an already crowded evaluation-metrics space. Paper 2's cross-disciplinary relevance (CS, environmental science, policy) gives it substantially broader impact potential.

vs. Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation

gpt-5.26/5/2026

Paper 1 likely has higher scientific impact due to its large-scale, facility-level empirical dataset (403 hyperscale data centers) and direct policy/industry relevance to energy, climate, and infrastructure planning. Its attributional emissions accounting using recent EPA eGRID data is methodologically grounded and broadly useful across environmental science, power systems, and tech policy, with immediate real-world applicability. Paper 2 is timely for AI reliability, but CHARM appears as an engineering framework evaluated on standard QA benchmarks; its novelty and generalizability beyond specific agentic RAG setups may be narrower and faster-moving/shorter-lived.

vs. ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents

gpt-5.26/5/2026

Paper 2 has higher likely scientific impact due to its broad, timely relevance to AI-driven infrastructure growth, energy policy, and climate mitigation, with immediate real-world applicability (facility-level accounting across 403 data centers) and cross-field utility (energy systems, environmental science, policy, computing). Its quantified national-scale estimates and reusable attributional methodology can inform regulation, siting, procurement, and lifecycle assessments. Paper 1 is novel for LLM-agent reliability and efficiency, but its impact is narrower to tool-using agents and depends on adoption of its contract-based framework; methodological rigor seems solid but less societally expansive.

vs. QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving

gemini-3.16/5/2026

While Paper 1 offers a valuable technical optimization for LLM serving, Paper 2 addresses a highly urgent, cross-disciplinary global issue: the environmental footprint of AI. Its findings on energy consumption and carbon emissions of hyperscale data centers have broad implications for policy, sustainability, and the tech industry, giving it a much wider potential impact across multiple fields compared to the narrowly focused systems optimization in Paper 1.

vs. Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains

gemini-3.16/5/2026

Paper 2 has higher potential scientific impact due to its extreme timeliness and broad interdisciplinary relevance. The environmental footprint of AI and hyperscale data centers is a critical global issue affecting tech policy, energy grid planning, and climate science. While Paper 1 offers a rigorous methodological advancement in pharmaceutical supply chain optimization, Paper 2 provides crucial empirical data on a rapidly scaling industry with massive societal implications. Paper 2's findings are highly likely to inform policy regulations, guide sustainable AI development, and spark widespread research across multiple disciplines.

vs. From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

claude-opus-4.66/5/2026

Paper 1 addresses a fundamental challenge in AI safety—hallucination detection in LLMs—by establishing a novel theoretical connection between OOD detection and hallucination detection. This geometric reframing offers a principled, training-free methodology with broad applicability across reasoning tasks and potential to spawn new research directions. Paper 2 provides valuable empirical data on data center emissions but is more descriptive and geographically/temporally bounded. Paper 1's methodological innovation and relevance to the rapidly growing LLM safety field give it higher potential for lasting scientific impact and citations.

vs. Agents' Last Exam

claude-opus-4.66/5/2026

Paper 1 provides novel, granular facility-level data on a critically important and timely topic—the environmental footprint of US hyperscale data centers driven by AI growth. Its methodology linking 403 specific facilities to EPA eGRID data yields a surprising finding (48% higher carbon intensity than grid average) with immediate policy relevance for energy planning, climate policy, and corporate sustainability. Paper 2 introduces a useful AI benchmark but enters a crowded benchmark landscape, and its impact depends on adoption. Paper 1 addresses a concrete, urgent societal concern with quantitative evidence that will be widely cited across energy, environmental, and policy fields.

vs. Harnessing Generalist Agents for Contextualized Time Series

claude-opus-4.66/5/2026

Paper 2 addresses a critically timely and societally important question—the environmental footprint of AI-driven hyperscale data centers—with novel facility-level empirical data (403 US HDCs). Its finding that HDC carbon intensity is 48% above the national grid average is striking and policy-relevant. The methodology provides a reusable attributional framework using public EPA data, enabling broad adoption. Its interdisciplinary impact spans energy policy, environmental science, and computer science. Paper 1, while technically solid, is more incremental—another LLM-agent framework for time series—in an increasingly crowded space with less distinctive empirical contribution.

vs. Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

gemini-3.16/5/2026

While Paper 1 provides highly timely empirical data on a critical environmental issue, Paper 2 introduces a foundational benchmark for continual learning in AI. In the rapidly advancing AI field, comprehensive, expert-validated benchmarks typically drive extensive future research, establishing the standard for evaluating new models. Consequently, Paper 2 is likely to generate a massive citation volume and broadly shape future AI development methodologies.

vs. FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG

gpt-5.26/5/2026

Paper 2 is more novel methodologically, proposing a new training-free, token-selective decoding approach (FIDES) using internal model signals to resolve retrieval–memory conflict in RAG, with broad applicability to many LLM systems and tasks. It reports extensive benchmarking across multiple datasets and model scales up to 70B, suggesting strong rigor and immediate relevance to a fast-moving area with wide cross-field impact (NLP, IR, trustworthy AI). Paper 1 is timely and societally important but is primarily an attributional measurement study with narrower methodological innovation and more limited transferability.

vs. Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

gpt-5.26/5/2026

Paper 2 likely has higher scientific impact due to greater novelty (a general framework for distilling transferable agent skills from trajectories), broad applicability across many LLM-agent domains, and strong evidence of cross-model and out-of-distribution transfer—suggesting a reusable method that can influence multiple subfields (agentic AI, tool use, instruction/skill learning). Paper 1 is timely and societally relevant with clear real-world applications, but is primarily an attributional assessment using existing data sources, with narrower methodological innovation and field breadth.