Zhang Kai, Yao Jingang
Generative search engines increasingly determine whether online information is merely discoverable, cited as a source, or actually absorbed into generated answers. This paper proposes a two-stage measurement framework for Generative Engine Optimization (GEO): citation selection, where a platform triggers search and chooses sources, and citation absorption, where a cited page contributes language, evidence, structure, or factual support to the final answer. We analyze the public geo-citation-lab dataset covering 602 controlled prompts across ChatGPT, Google AI Overview/Gemini, and Perplexity; 21,143 valid search-layer citations; 23,745 citation-level feature records; 18,151 successfully fetched pages; and 72 extracted features. The central descriptive finding is that citation breadth and citation depth diverge. Perplexity and Google cite more sources on average, while ChatGPT cites fewer sources but shows substantially higher average citation influence among fetched pages. High-influence pages tend to be longer, more structured, semantically aligned, and richer in extractable evidence such as definitions, numerical facts, comparisons, and procedural steps. The results suggest that GEO should be measured beyond citation counts, with answer-level absorption treated as a separate outcome.
This paper proposes a two-stage measurement framework for Generative Engine Optimization (GEO) that distinguishes between citation selection (whether a source is cited) and citation absorption (whether a cited source materially shapes the generated answer). The authors analyze a dataset of 602 prompts across three major AI search platforms (ChatGPT, Google AI Overview/Gemini, Perplexity) with 21,143 valid citations and 72 extracted features. The central finding is that citation breadth and depth diverge: ChatGPT cites fewer sources but exhibits substantially higher per-citation influence (mean 0.2713 vs. ~0.06 for Google and Perplexity), while Perplexity cites broadly but with lower per-source absorption.
The conceptual contribution—separating selection from absorption—is intuitive and timely. The "evidence-container hypothesis" (that pages serving as modular, semantically-aligned evidence packages are absorbed more deeply) provides a useful organizing framework for future GEO research.
The paper is remarkably self-aware about its methodological limitations, which is both a strength and a weakness. The authors explicitly avoid causal claims, provide a four-level identification map, separate outcome components from permissible explanatory variables, and include a claim-level self-audit. This level of epistemic discipline is unusual and commendable.
However, this self-awareness reveals that the paper is, in its current form, purely descriptive. There are no inferential statistics—no confidence intervals, no regression models, no hypothesis tests. The authors explicitly state this is deliberate, positioning the paper as a "pre-submission draft" with confirmatory models deferred to a future version. While intellectually honest, this means the paper's empirical claims rest entirely on point estimates and mean comparisons without uncertainty quantification.
The influence_score construction is a critical methodological concern. It is a hand-designed weighted composite (ref_count, first_position_ratio, paragraph_coverage_ratio, TF-IDF cosine, bigram/trigram overlap) with fixed weights (0.20/0.15/0.20/0.25/0.20). The choice of weights is not validated against any ground truth—there is no human annotation of "true" absorption, no ablation study of weight sensitivity (though one is proposed), and no comparison against alternative operationalizations. The entire absorption analysis depends on this single proxy, and the platform-level divergence (ChatGPT's much higher scores) could partly reflect systematic differences in answer structure, length, or citation rendering rather than genuine differences in source utilization.
The dataset design, with layered prompts (A/B/C/D) controlling for task type, style, language, and scenario complexity, is a genuine strength over uncontrolled scraping. However, 602 prompts is a modest sample, and the prompt distribution is researcher-designed rather than drawn from real user behavior, limiting external validity.
The selection-absorption distinction has clear practical value for publishers, content strategists, and researchers studying AI-mediated information access. If the framework gains traction, it could:
The finding that Q&A formatting alone does not improve absorption is practically useful, as it pushes back against simplistic SEO-to-GEO advice. Similarly, the evidence-genre analysis (code, statistics, definitions showing higher influence) offers actionable hypotheses.
However, the practical impact is constrained by the lack of causal evidence. The paper repeatedly acknowledges that it cannot determine whether structured, evidence-rich pages are absorbed *because* of their features or due to confounding factors (domain authority, editorial quality, etc.).
The paper addresses a genuinely emerging need. As generative AI search displaces traditional search, understanding how sources are used (not just cited) is increasingly important for publishers, policymakers, and researchers. The GEO literature is nascent, and measurement frameworks are needed. The cross-platform comparison (ChatGPT, Google, Perplexity) is particularly timely given the rapid evolution of these products.
The paper positions itself well within the emerging GEO literature, building on Aggarwal et al.'s foundational GEO framework and the AgentGEO line of research on citation failures.
The paper reads more like a detailed research protocol with preliminary results than a completed study. The extensive discussion of future robustness checks, pre-registration plans, and proposed experimental extensions—while valuable—highlights how much remains undone. The writing is thorough but repetitive, restating limitations and caveats extensively across sections. The ethical discussion of responsible GEO is appropriate but underdeveloped relative to the space allocated.
The platform archetype characterizations (ChatGPT as "absorption-heavy," Perplexity as "coverage-oriented") are interesting descriptive labels but may reflect measurement artifacts rather than genuine platform strategies.
Generated Apr 29, 2026
Paper 2 likely has higher scientific impact due to broader cross-field relevance and timeliness: it introduces a general measurement framework for “citation absorption” across multiple major AI search platforms, with a sizable multi-platform dataset analysis. Its metrics can inform IR, NLP evaluation, web publishing/SEO, misinformation studies, and HCI. Paper 1 is more algorithmically novel and includes online A/B results, but appears platform-specific to conversation-starter recommendation with narrower transferability and modest reported gains.
Paper 2 addresses a highly timely and broadly impactful topic: Generative Engine Optimization (GEO) in AI search engines. As generative AI reshapes information retrieval, understanding citation selection and absorption is critical for web science, NLP, and industry (e.g., SEO). Paper 1 offers a strong methodological contribution but is constrained to the specific niche of federated multi-market CTR prediction, giving Paper 2 a much wider potential audience, broader interdisciplinary applications, and higher real-world relevance.
Both papers provided are completely identical in title and abstract. Therefore, they have the exact same scientific impact. Paper 1 is selected by default as there is no difference between the two submissions.
Paper 2 has higher potential impact due to broader applicability and clearer methodological contribution: a general framework (distribution shaping with LLM-derived profiles) that can plug into multiple recommender models and shows consistent gains across datasets. Recommender systems are a mature, high-impact domain with direct industrial deployment, so improvements can translate quickly to real-world systems. Paper 1 is timely and novel in measuring “citation absorption” for generative search, but it is primarily a descriptive measurement framework tied to specific platforms/datasets, potentially limiting generalization and cross-field uptake compared to Paper 2’s reusable modeling approach.
Paper 1 introduces a novel two-stage measurement framework for an emerging and rapidly growing field (Generative Engine Optimization), addressing how AI search engines absorb content beyond simple citation counts. This is highly timely given the explosive growth of generative AI search platforms and has broad implications for SEO, information retrieval, digital publishing, and AI transparency. Paper 2 offers an incremental improvement combining existing techniques (meta-learning + targeted DP) for recommender systems. While solid, it addresses a more established problem space with less transformative potential.
Paper 2 addresses a well-established research area (recommender systems) with a novel federated learning framework that tackles clearly defined theoretical and practical challenges (source degradation, negative transfer). It provides theoretical analysis, a new loss function (S²CE), and extensive experiments on real-world datasets. Paper 1 introduces a measurement framework for GEO, which is timely but primarily descriptive/observational with a narrower scope (SEO/content optimization for AI search). Paper 2 has stronger methodological contributions, broader applicability across federated learning and recommendation communities, and more rigorous experimental validation.
Paper 1 introduces a novel federated many-to-many collaboration paradigm for cross-market sequential recommendation, directly addressing privacy/data-isolation constraints and heterogeneity via a new loss (S^2CE) plus adaptation, with theoretical and empirical support. This is methodologically substantive and broadly applicable to real-world recommender systems across industries and markets, making it timely amid growing federated learning adoption. Paper 2 offers a useful measurement framework and descriptive analysis for GEO across platforms, but is more observational, potentially platform/dataset-dependent, and likely narrower in methodological innovation and generalizability than Paper 1.
Paper 2 addresses a highly timely and paradigm-shifting topic: Generative Engine Optimization (GEO). By providing a large-scale empirical analysis across major AI search engines, it establishes foundational metrics for an emerging field. This offers broad, real-world implications for information retrieval, web science, and the future of digital content ecosystems. While Paper 1 presents an innovative HCI approach, its evaluation is on a much smaller scale, making Paper 2's potential breadth of impact and relevance significantly higher.
Paper 2 introduces a novel cross-disciplinary framework applying recommender systems to scientific model selection in CFD, addressing a fundamental challenge in computational science. Its methodological rigor (13,600 simulations, nested cross-validation, domain-specific metrics) is strong, and the approach generalizes beyond multiphase flow to any expensive simulation domain requiring model selection. Paper 1, while timely regarding generative AI search, is primarily descriptive/measurement-focused with a narrower SEO-adjacent audience. Paper 2's broader applicability across computational science and engineering fields gives it higher long-term impact potential.
Paper 1 addresses a highly timely and universally relevant problem: how generative AI search engines select and absorb information. Its framework for Generative Engine Optimization (GEO) has broad implications across information retrieval, web publishing, and the broader internet ecosystem. While Paper 2 provides a valuable domain-specific benchmark for materials science, Paper 1's research impacts a much wider array of disciplines and addresses a fundamental shift in global information access.