Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers
Gianluca Guidi, Francesca Dominici, Tiziano Squartini, Callaway Sprinkle, Jonathan Gilmour, Kevin Butler, Eric Bell, Scott Delaney
Abstract
The rapid proliferation of hyperscale data centers (HDCs) in the US, mainly driven by the adoption of artificial intelligence, has raised concerns about this industry's environmental footprint. We compiled facility-level information on 403 US hyperscale data centers operating between May 2024 and April 2025 and estimated their electricity consumption, electricity sources, and attributable CO2 emissions. Across different facility-load scenarios, these HDCs consumed approximately 68-99 TWh of electricity and were associated with about 37-54 million metric tons of CO2. Under the central scenario, HDC electricity demand corresponded to approximately 1.8% of total US electricity consumption, with roughly 54% of attributed generation supplied by fossil-fuel sources. The HDC electricity-weighted average carbon intensity was approximately 545 gCO2/kWh, about 48% above the contemporaneous US national grid-average carbon intensity of 370 gCO2/kWh. Our approach provides an attributional tool for assessing the environmental footprint of hyperscale data centers using the most recent EPA eGRID plant-level data.
AI Impact Assessments
(1 models)Scientific Impact Assessment
1. Core Contribution
This paper provides a bottom-up, facility-level assessment of the electricity consumption and attributable CO₂ emissions of 403 US hyperscale data centers (HDCs) during May 2024–April 2025. The central finding is that these HDCs consumed approximately 82 TWh of electricity (central scenario) and were responsible for roughly 45 Mt CO₂ — representing ~1.8% of total US electricity consumption with a weighted carbon intensity of ~545 gCO₂/kWh, approximately 48% above the national grid average. The main novelty lies in the integration pipeline: harmonizing commercial facility data (Baxtel), satellite imagery validation via OpenStreetMap, EPA eGRID2023 plant-level emissions data, and balancing-authority-level attribution — applied to a substantially larger and more current dataset than prior work (e.g., Siddik et al. 2021, which covered 2018 and found 10.5 Mt CO₂).
2. Methodological Rigor
Strengths in methodology:
Methodological limitations:
3. Potential Impact
The paper addresses a topic of high public and policy relevance. Key impact pathways include:
However, the practical impact is somewhat constrained by the attributional (vs. consequential) framework, which limits the paper's utility for informing siting decisions or policy interventions targeting marginal emissions.
4. Timeliness & Relevance
This paper is exceptionally timely. The explosive growth of AI workloads has made data center energy consumption a first-order policy concern, yet peer-reviewed evidence has lagged far behind industry reports and journalistic coverage. The most-cited academic estimate (Siddik et al. 2021) used 2018 data — a lifetime ago in AI infrastructure terms. By providing 2024-2025 estimates using a validated methodology, this paper fills a critical gap. The 3.5–5x increase in HDC emissions relative to 2018 quantifies the scale of the problem in a way that prior projections (e.g., Xiao et al.'s forward-looking estimates) could not.
5. Strengths & Limitations
Key strengths:
Notable weaknesses:
Additional Observations
The paper is well-written and clearly structured, with extensive supplementary materials. The interdisciplinary team (biostatistics, environmental health, computer science, GIS) brings appropriate expertise. However, the paper reads more as a carefully constructed accounting exercise than a methodological advance — the attribution framework is standard, and the ML component is minor. The primary contribution is the data compilation and validation effort, which, while valuable, is inherently perishable given the rapid pace of HDC construction.
The restriction on facility-level data release is understandable but significantly limits the paper's scientific utility for downstream research. The synthetic datasets provided are a reasonable compromise but cannot substitute for real facility-level analysis.
Generated Jun 5, 2026
Comparison History (20)
Paper 2 has higher likely scientific impact due to strong real-world relevance and timeliness (AI-driven data-center growth), clear policy and industry applications (energy planning, emissions accounting, regulation), and broad cross-field reach (energy systems, climate science, CS/AI infrastructure, economics). Its facility-level dataset and transparent attributional methodology using current EPA eGRID data support rigor and reuse. Paper 1 is novel within LLM-agent memory evaluation and may influence agent design, but its impact is narrower and more contingent on rapid shifts in LLM tooling and benchmarks.
Paper 2 addresses a critically timely and broadly relevant topic—the environmental footprint of hyperscale data centers driven by AI growth—with novel facility-level empirical data (403 HDCs) that fills a major knowledge gap. Its findings (HDC carbon intensity 48% above grid average) have immediate policy implications and broad interdisciplinary impact across energy, environmental science, computer science, and policy. Paper 1, while technically sound, addresses a narrower ML/epidemiology niche with incremental improvements to LLM-based forecasting. Paper 2's empirical contribution and societal relevance give it substantially broader citation potential and impact.
Paper 2 has higher potential impact: it provides a timely, facility-level, policy-relevant quantification of U.S. hyperscale data center electricity use and CO2 emissions using recent eGRID data, enabling benchmarking and informing regulation, grid planning, and corporate decarbonization across energy, climate, and computing. Its dataset and attributional tool can be widely reused and updated. Paper 1 is technically innovative for real-time trajectory prediction and valuable for autonomy, but its impact is more domain-specific and depends on reproducibility and adoption in deployed stacks.
Paper 1 addresses a critically timely and high-visibility topic—the environmental footprint of AI-driven hyperscale data centers—with novel facility-level empirical data covering 403 US data centers. Its finding that HDC carbon intensity is 48% above the national grid average has immediate policy relevance and broad societal impact across energy, environmental, and technology domains. Paper 2, while technically sound, presents an incremental improvement (~1.95% MASE reduction) to a specific time series forecasting framework, limiting its breadth of impact to a narrower ML audience.
While Paper 1 provides a highly relevant empirical analysis of AI's environmental impact, Paper 2 offers a concrete, novel methodological solution to this challenge by drastically reducing the computational and memory requirements of LLM deployment. The proposed quantization framework achieves remarkable performance gains at ultra-low bitrates, which will directly drive widespread adoption in AI research and industry, leading to higher technological impact and citations.
Paper 2 has higher likely scientific impact due to its timely, quantitative, facility-level assessment of AI-driven hyperscale data center electricity use and CO2 emissions, producing actionable national-scale estimates and a replicable attributional methodology using current eGRID data. Its results directly inform energy policy, grid planning, corporate reporting, and climate research, with broad cross-field relevance (computing, power systems, environmental science, economics). Paper 1 is integrative and relevant but relies largely on secondary datasets and yields more qualitative, less methodologically novel conclusions, limiting incremental impact.
Paper 2 addresses a highly timely and socially significant topic—the environmental impact of hyperscale data centers driven by AI growth. It provides the first comprehensive facility-level assessment of 403 US hyperscale data centers, offering concrete empirical data (68-99 TWh consumption, carbon intensity 48% above grid average) that will be widely cited in policy, sustainability, and computing research. Its broad interdisciplinary relevance spanning energy policy, environmental science, and computer science, combined with immediate real-world policy implications, gives it higher potential impact than Paper 1's more incremental technical contribution to multimodal time series modeling.
Paper 2 addresses a highly urgent and globally relevant issue: the environmental impact and energy consumption of hyperscale data centers driven by the AI boom. Its findings have broad implications for climate policy, energy grid planning, and the tech industry. While Paper 1 presents an interesting AI/robotics framework for spatial reasoning, Paper 2's potential to influence real-world sustainability efforts, policy decisions, and cross-disciplinary research gives it a higher scientific and societal impact.
Paper 2 addresses a critical, globally relevant issue—the environmental footprint of AI infrastructure. Its empirical assessment of hyperscale data centers has broad implications across environmental science, energy policy, and the tech industry, offering foundational data for future sustainability efforts. Paper 1 presents a practical but relatively incremental solution for LLM security, which, while useful, is narrower in scope and cross-disciplinary impact compared to the macro-level climate and energy insights provided by Paper 2.
Paper 2 addresses a timely, high-stakes environmental question with rigorous empirical methodology—compiling facility-level data on 403 US hyperscale data centers and quantifying their carbon footprint. Its finding that HDC carbon intensity is 48% above the national grid average is a striking, policy-relevant result with broad implications for AI sustainability, energy policy, and climate discussions. Paper 1 proposes a conceptual framework for entropy-based agent evaluation that, while interesting, lacks empirical validation and offers incremental contributions to an already crowded evaluation-metrics space. Paper 2's cross-disciplinary relevance (CS, environmental science, policy) gives it substantially broader impact potential.
Paper 1 likely has higher scientific impact due to its large-scale, facility-level empirical dataset (403 hyperscale data centers) and direct policy/industry relevance to energy, climate, and infrastructure planning. Its attributional emissions accounting using recent EPA eGRID data is methodologically grounded and broadly useful across environmental science, power systems, and tech policy, with immediate real-world applicability. Paper 2 is timely for AI reliability, but CHARM appears as an engineering framework evaluated on standard QA benchmarks; its novelty and generalizability beyond specific agentic RAG setups may be narrower and faster-moving/shorter-lived.
Paper 2 has higher likely scientific impact due to its broad, timely relevance to AI-driven infrastructure growth, energy policy, and climate mitigation, with immediate real-world applicability (facility-level accounting across 403 data centers) and cross-field utility (energy systems, environmental science, policy, computing). Its quantified national-scale estimates and reusable attributional methodology can inform regulation, siting, procurement, and lifecycle assessments. Paper 1 is novel for LLM-agent reliability and efficiency, but its impact is narrower to tool-using agents and depends on adoption of its contract-based framework; methodological rigor seems solid but less societally expansive.
While Paper 1 offers a valuable technical optimization for LLM serving, Paper 2 addresses a highly urgent, cross-disciplinary global issue: the environmental footprint of AI. Its findings on energy consumption and carbon emissions of hyperscale data centers have broad implications for policy, sustainability, and the tech industry, giving it a much wider potential impact across multiple fields compared to the narrowly focused systems optimization in Paper 1.
Paper 2 has higher potential scientific impact due to its extreme timeliness and broad interdisciplinary relevance. The environmental footprint of AI and hyperscale data centers is a critical global issue affecting tech policy, energy grid planning, and climate science. While Paper 1 offers a rigorous methodological advancement in pharmaceutical supply chain optimization, Paper 2 provides crucial empirical data on a rapidly scaling industry with massive societal implications. Paper 2's findings are highly likely to inform policy regulations, guide sustainable AI development, and spark widespread research across multiple disciplines.
Paper 1 addresses a fundamental challenge in AI safety—hallucination detection in LLMs—by establishing a novel theoretical connection between OOD detection and hallucination detection. This geometric reframing offers a principled, training-free methodology with broad applicability across reasoning tasks and potential to spawn new research directions. Paper 2 provides valuable empirical data on data center emissions but is more descriptive and geographically/temporally bounded. Paper 1's methodological innovation and relevance to the rapidly growing LLM safety field give it higher potential for lasting scientific impact and citations.
Paper 1 provides novel, granular facility-level data on a critically important and timely topic—the environmental footprint of US hyperscale data centers driven by AI growth. Its methodology linking 403 specific facilities to EPA eGRID data yields a surprising finding (48% higher carbon intensity than grid average) with immediate policy relevance for energy planning, climate policy, and corporate sustainability. Paper 2 introduces a useful AI benchmark but enters a crowded benchmark landscape, and its impact depends on adoption. Paper 1 addresses a concrete, urgent societal concern with quantitative evidence that will be widely cited across energy, environmental, and policy fields.
Paper 2 addresses a critically timely and societally important question—the environmental footprint of AI-driven hyperscale data centers—with novel facility-level empirical data (403 US HDCs). Its finding that HDC carbon intensity is 48% above the national grid average is striking and policy-relevant. The methodology provides a reusable attributional framework using public EPA data, enabling broad adoption. Its interdisciplinary impact spans energy policy, environmental science, and computer science. Paper 1, while technically solid, is more incremental—another LLM-agent framework for time series—in an increasingly crowded space with less distinctive empirical contribution.
While Paper 1 provides highly timely empirical data on a critical environmental issue, Paper 2 introduces a foundational benchmark for continual learning in AI. In the rapidly advancing AI field, comprehensive, expert-validated benchmarks typically drive extensive future research, establishing the standard for evaluating new models. Consequently, Paper 2 is likely to generate a massive citation volume and broadly shape future AI development methodologies.
Paper 2 is more novel methodologically, proposing a new training-free, token-selective decoding approach (FIDES) using internal model signals to resolve retrieval–memory conflict in RAG, with broad applicability to many LLM systems and tasks. It reports extensive benchmarking across multiple datasets and model scales up to 70B, suggesting strong rigor and immediate relevance to a fast-moving area with wide cross-field impact (NLP, IR, trustworthy AI). Paper 1 is timely and societally important but is primarily an attributional measurement study with narrower methodological innovation and more limited transferability.
Paper 2 likely has higher scientific impact due to greater novelty (a general framework for distilling transferable agent skills from trajectories), broad applicability across many LLM-agent domains, and strong evidence of cross-model and out-of-distribution transfer—suggesting a reusable method that can influence multiple subfields (agentic AI, tool use, instruction/skill learning). Paper 1 is timely and societally relevant with clear real-world applications, but is primarily an attributional assessment using existing data sources, with narrower methodological innovation and field breadth.