From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms

Zhang Kai, Yao Jingang

Apr 28, 2026arXiv:2604.25707v1

cs.IR

v1v2

Frozen v1 — this version was superseded on arXiv. Stats reflect the state at freeze time.View latest (v2) →

#253of 655·cs.IR

#253 of 655 · cs.IR

Tournament Score

1433±28

11001750

51%

Win Rate

Wins

Losses

Matches

Rating

4.5/ 10

Significance5.5

Rigor4

Novelty5.5

Clarity6

Abstract

Generative search engines increasingly determine whether online information is merely discoverable, cited as a source, or actually absorbed into generated answers. This paper proposes a two-stage measurement framework for Generative Engine Optimization (GEO): citation selection, where a platform triggers search and chooses sources, and citation absorption, where a cited page contributes language, evidence, structure, or factual support to the final answer. We analyze the public geo-citation-lab dataset covering 602 controlled prompts across ChatGPT, Google AI Overview/Gemini, and Perplexity; 21,143 valid search-layer citations; 23,745 citation-level feature records; 18,151 successfully fetched pages; and 72 extracted features. The central descriptive finding is that citation breadth and citation depth diverge. Perplexity and Google cite more sources on average, while ChatGPT cites fewer sources but shows substantially higher average citation influence among fetched pages. High-influence pages tend to be longer, more structured, semantically aligned, and richer in extractable evidence such as definitions, numerical facts, comparisons, and procedural steps. The results suggest that GEO should be measured beyond citation counts, with answer-level absorption treated as a separate outcome.

AI Impact Assessments

(3 models)

Scientific Impact Assessment

1. Core Contribution

This paper proposes a two-stage measurement framework for Generative Engine Optimization (GEO) that distinguishes between citation selection (whether a source is cited) and citation absorption (whether a cited source materially shapes the generated answer). The authors analyze a dataset of 602 prompts across three major AI search platforms (ChatGPT, Google AI Overview/Gemini, Perplexity) with 21,143 valid citations and 72 extracted features. The central finding is that citation breadth and depth diverge: ChatGPT cites fewer sources but exhibits substantially higher per-citation influence (mean 0.2713 vs. ~0.06 for Google and Perplexity), while Perplexity cites broadly but with lower per-source absorption.

The conceptual contribution—separating selection from absorption—is intuitive and timely. The "evidence-container hypothesis" (that pages serving as modular, semantically-aligned evidence packages are absorbed more deeply) provides a useful organizing framework for future GEO research.

2. Methodological Rigor

The paper is remarkably self-aware about its methodological limitations, which is both a strength and a weakness. The authors explicitly avoid causal claims, provide a four-level identification map, separate outcome components from permissible explanatory variables, and include a claim-level self-audit. This level of epistemic discipline is unusual and commendable.

However, this self-awareness reveals that the paper is, in its current form, purely descriptive. There are no inferential statistics—no confidence intervals, no regression models, no hypothesis tests. The authors explicitly state this is deliberate, positioning the paper as a "pre-submission draft" with confirmatory models deferred to a future version. While intellectually honest, this means the paper's empirical claims rest entirely on point estimates and mean comparisons without uncertainty quantification.

The influence_score construction is a critical methodological concern. It is a hand-designed weighted composite (ref_count, first_position_ratio, paragraph_coverage_ratio, TF-IDF cosine, bigram/trigram overlap) with fixed weights (0.20/0.15/0.20/0.25/0.20). The choice of weights is not validated against any ground truth—there is no human annotation of "true" absorption, no ablation study of weight sensitivity (though one is proposed), and no comparison against alternative operationalizations. The entire absorption analysis depends on this single proxy, and the platform-level divergence (ChatGPT's much higher scores) could partly reflect systematic differences in answer structure, length, or citation rendering rather than genuine differences in source utilization.

The dataset design, with layered prompts (A/B/C/D) controlling for task type, style, language, and scenario complexity, is a genuine strength over uncontrolled scraping. However, 602 prompts is a modest sample, and the prompt distribution is researcher-designed rather than drawn from real user behavior, limiting external validity.

3. Potential Impact

The selection-absorption distinction has clear practical value for publishers, content strategists, and researchers studying AI-mediated information access. If the framework gains traction, it could:

Shift GEO practice from citation-counting to measuring answer-level influence

Inform content strategy (evidence-container design over superficial formatting)

Provide vocabulary for a rapidly growing research area

Raise important questions about information equity and source concentration in AI search

The finding that Q&A formatting alone does not improve absorption is practically useful, as it pushes back against simplistic SEO-to-GEO advice. Similarly, the evidence-genre analysis (code, statistics, definitions showing higher influence) offers actionable hypotheses.

However, the practical impact is constrained by the lack of causal evidence. The paper repeatedly acknowledges that it cannot determine whether structured, evidence-rich pages are absorbed *because* of their features or due to confounding factors (domain authority, editorial quality, etc.).

4. Timeliness & Relevance

The paper addresses a genuinely emerging need. As generative AI search displaces traditional search, understanding how sources are used (not just cited) is increasingly important for publishers, policymakers, and researchers. The GEO literature is nascent, and measurement frameworks are needed. The cross-platform comparison (ChatGPT, Google, Perplexity) is particularly timely given the rapid evolution of these products.

The paper positions itself well within the emerging GEO literature, building on Aggarwal et al.'s foundational GEO framework and the AgentGEO line of research on citation failures.

5. Strengths & Limitations

Key Strengths:

Conceptual clarity: The selection-absorption distinction is well-articulated and fills a genuine gap in GEO measurement

Epistemic honesty: The identification map, claim-level self-audit, and explicit separation of descriptive findings from causal claims set a high standard for transparency

Cross-platform comparison: Simultaneous analysis of three major platforms under controlled conditions is valuable

Counter-intuitive findings: The Q&A result, the news selection-absorption gap, and the language interaction effects challenge shallow heuristics

Comprehensive reproducibility plan: The detailed robustness checks, pre-registration plan, and data dictionary demonstrate methodological sophistication

Key Limitations:

No inferential statistics: The paper is entirely descriptive with deferred confirmatory analysis, making it incomplete as a scientific contribution

Unvalidated outcome measure: The influence_score proxy lacks external validation, and the fixed weights are arbitrary

Modest sample: 602 prompts across three platforms, with some cells (e.g., D-layer scenarios) containing only 50 observations

Temporal fragility: AI search platforms change rapidly; the snapshot may not generalize beyond the collection window

Fetch-ok conditioning: 23.56% of pages were not successfully fetched, and this non-random missingness could bias absorption estimates

Self-referential dataset: The paper analyzes a dataset from the same research team, with the repository serving as both data source and citation—raising questions about independent validation

Draft status: The paper explicitly identifies itself as a "pre-submission draft," which limits its current scientific completeness

Additional Observations

The paper reads more like a detailed research protocol with preliminary results than a completed study. The extensive discussion of future robustness checks, pre-registration plans, and proposed experimental extensions—while valuable—highlights how much remains undone. The writing is thorough but repetitive, restating limitations and caveats extensively across sections. The ethical discussion of responsible GEO is appropriate but underdeveloped relative to the space allocated.

The platform archetype characterizations (ChatGPT as "absorption-heavy," Perplexity as "coverage-oriented") are interesting descriptive labels but may reflect measurement artifacts rather than genuine platform strategies.

Rating:4.5/ 10

Significance 5.5Rigor 4Novelty 5.5Clarity 6

Generated Apr 29, 2026

Comparison History (45)

Wonvs. Bridging Passive and Active: Enhancing Conversation Starter Recommendation via Active Expression Modeling

Paper 2 likely has higher scientific impact due to broader cross-field relevance and timeliness: it introduces a general measurement framework for “citation absorption” across multiple major AI search platforms, with a sizable multi-platform dataset analysis. Its metrics can inform IR, NLP evaluation, web publishing/SEO, misinformation studies, and HCI. Paper 1 is more algorithmically novel and includes online A/B results, but appears platform-specific to conversation-starter recommendation with narrower transferability and modest reported gains.

gpt-5.2·May 14, 2026

Wonvs. FedMM: Federated Collaborative Signal Quantization for Multi-Market CTR Prediction

Paper 2 addresses a highly timely and broadly impactful topic: Generative Engine Optimization (GEO) in AI search engines. As generative AI reshapes information retrieval, understanding citation selection and absorption is critical for web science, NLP, and industry (e.g., SEO). Paper 1 offers a strong methodological contribution but is constrained to the specific niche of federated multi-market CTR prediction, giving Paper 2 a much wider potential audience, broader interdisciplinary applications, and higher real-world relevance.

gemini-3.1-pro-preview·May 13, 2026

Lostvs. From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms

Both papers provided are completely identical in title and abstract. Therefore, they have the exact same scientific impact. Paper 1 is selected by default as there is no difference between the two submissions.

gemini-3-pro-preview·Apr 30, 2026

Lostvs. ProMax: Exploring the Potential of LLM-derived Profiles with Distribution Shaping for Recommender Systems

Paper 2 has higher potential impact due to broader applicability and clearer methodological contribution: a general framework (distribution shaping with LLM-derived profiles) that can plug into multiple recommender models and shows consistent gains across datasets. Recommender systems are a mature, high-impact domain with direct industrial deployment, so improvements can translate quickly to real-world systems. Paper 1 is timely and novel in measuring “citation absorption” for generative search, but it is primarily a descriptive measurement framework tied to specific platforms/datasets, potentially limiting generalization and cross-field uptake compared to Paper 2’s reusable modeling approach.

gpt-5.2·Apr 30, 2026

Wonvs. Meta-Learning and Targeted Differential Privacy to Improve the Accuracy-Privacy Trade-off in Recommendations

Paper 1 introduces a novel two-stage measurement framework for an emerging and rapidly growing field (Generative Engine Optimization), addressing how AI search engines absorb content beyond simple citation counts. This is highly timely given the explosive growth of generative AI search platforms and has broad implications for SEO, information retrieval, digital publishing, and AI transparency. Paper 2 offers an incremental improvement combining existing techniques (meta-learning + targeted DP) for recommender systems. While solid, it addresses a more established problem space with less transformative potential.

claude-opus-4-6·Apr 30, 2026

Lostvs. From Transfer to Collaboration: A Federated Framework for Cross-Market Sequential Recommendation

Paper 2 addresses a well-established research area (recommender systems) with a novel federated learning framework that tackles clearly defined theoretical and practical challenges (source degradation, negative transfer). It provides theoretical analysis, a new loss function (S²CE), and extensive experiments on real-world datasets. Paper 1 introduces a measurement framework for GEO, which is timely but primarily descriptive/observational with a narrower scope (SEO/content optimization for AI search). Paper 2 has stronger methodological contributions, broader applicability across federated learning and recommendation communities, and more rigorous experimental validation.

claude-opus-4-6·Apr 29, 2026

Lostvs. From Transfer to Collaboration: A Federated Framework for Cross-Market Sequential Recommendation

Paper 1 introduces a novel federated many-to-many collaboration paradigm for cross-market sequential recommendation, directly addressing privacy/data-isolation constraints and heterogeneity via a new loss (S^2CE) plus adaptation, with theoretical and empirical support. This is methodologically substantive and broadly applicable to real-world recommender systems across industries and markets, making it timely amid growing federated learning adoption. Paper 2 offers a useful measurement framework and descriptive analysis for GEO across platforms, but is more observational, potentially platform/dataset-dependent, and likely narrower in methodological innovation and generalizability than Paper 1.

gpt-5.2·Apr 29, 2026

Wonvs. Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration

Paper 2 addresses a highly timely and paradigm-shifting topic: Generative Engine Optimization (GEO). By providing a large-scale empirical analysis across major AI search engines, it establishes foundational metrics for an emerging field. This offers broad, real-world implications for information retrieval, web science, and the future of digital content ecosystems. While Paper 1 presents an innovative HCI approach, its evaluation is on a much smaller scale, making Paper 2's potential breadth of impact and relevance significantly higher.

gemini-3-pro-preview·Apr 29, 2026

Lostvs. Hybrid Cold-Start Recommender System for Closure Model Selection in Multiphase Flow Simulations

Paper 2 introduces a novel cross-disciplinary framework applying recommender systems to scientific model selection in CFD, addressing a fundamental challenge in computational science. Its methodological rigor (13,600 simulations, nested cross-validation, domain-specific metrics) is strong, and the approach generalizes beyond multiphase flow to any expensive simulation domain requiring model selection. Paper 1, while timely regarding generative AI search, is primarily descriptive/measurement-focused with a narrower SEO-adjacent audience. Paper 2's broader applicability across computational science and engineering fields gives it higher long-term impact potential.

claude-opus-4-6·Apr 29, 2026

Wonvs. LitXBench: A Benchmark for Extracting Experiments from Scientific Literature

Paper 1 addresses a highly timely and universally relevant problem: how generative AI search engines select and absorb information. Its framework for Generative Engine Optimization (GEO) has broad implications across information retrieval, web publishing, and the broader internet ecosystem. While Paper 2 provides a valuable domain-specific benchmark for materials science, Paper 1's research impacts a much wider array of disciplines and addresses a fundamental shift in global information access.

gemini-3-pro-preview·Apr 29, 2026

#253of 655·cs.IR

#253 of 655 · cs.IR

Tournament Score

1433±28

11001750

51%

Win Rate

Wins

Losses

Matches

Rating

4.5/ 10

Significance5.5

Rigor4

Novelty5.5

Clarity6