MOC: Multi-Order Communication in LLM-based Multi-Agent Systems
Yao Guan, Lin Wang, Zhihu Lu, Ziyi Wang, Wenzhu Yan, Qiang Duan
Abstract
Despite the remarkable progress of Large Language Model (LLM) based Multi-Agent Systems, most research focuses on optimizing coordination topology while largely underexploring the equally critical problem: how to transmit and optimize messages among agents effectively? Current communication schemes typically rely on the direct concatenation of first-order neighbor responses, which induces a restricted evidence receptive field and leads to the dilution of crucial insights over multi-hop paths. To address these limitations, we propose the Multi-Order Communication (MOC) scheme, which reconstructs the inter-agent communication to capture multi-hop dependencies and incorporates a structural message consolidation strategy to ensure efficiency. Specifically, we formalize the communication mechanism to construct a structured multi-order evidence stream, and subsequently design a Semantic-Topological Merging algorithm to optimize semantic fidelity within token constraints. Extensive experiments across six diverse datasets and LLM backbones of varying parameter scales demonstrate that MOC consistently improves task performance and reduces communication costs. The code is available at https://github.com/yao-guan/MOC.
AI Impact Assessments
(1 models)Scientific Impact Assessment: MOC: Multi-Order Communication in LLM-based Multi-Agent Systems
1. Core Contribution
MOC addresses a specific gap in LLM-based multi-agent systems (MAS): while prior work extensively optimizes *which* agents connect to each other (topology), the *how* of message transmission along those connections has remained primitive—typically just concatenating first-order neighbor outputs. MOC proposes two key innovations: (1) a multi-order evidence stream that exposes target agents to raw upstream responses from multiple hop distances (not just immediate neighbors), organized by topological order; and (2) a Semantic-Topological Merging algorithm that consolidates redundant messages to manage context length growth.
The analogy to MixHop in GNNs is well-drawn: just as MixHop concatenates multi-hop aggregated representations, MOC surfaces multi-hop raw messages. The key insight is that intermediate agents' paraphrasing and summarization inevitably attenuate information ("semantic attenuation"), and providing direct access to upstream sources can mitigate this.
2. Methodological Rigor
Formalization. The paper provides a clean graph-theoretic formalization of intra-round MAS communication via DAGs, topological execution ordering, and k-hop reachability via adjacency matrix powers. The framework is well-structured and the notation is consistent.
Consolidation Strategy. The Semantic-Topological Merging algorithm uses embedding-based similarity to identify redundant message pairs, applies forward-merging (anchoring at the topologically later message), and distills merged content via a lightweight model with candidate selection optimizing cosine similarity to originals. The topology-aware budget constraint (Eq. 16) and batch-wise ε-approximate merging are practical design choices.
Experimental Design. The evaluation covers six datasets across three task categories (general reasoning, math, code), three LLM backbones of different scales (27B, 32B, 685B), and multiple edge densities. This breadth is commendable. However, several concerns arise:
3. Potential Impact
Practical Relevance. As LLM-based MAS grow in complexity with more agents and deeper coordination chains, the communication bottleneck becomes real. MOC's approach of treating communication as a first-class design dimension—separate from topology—is a useful conceptual contribution that could influence how MAS frameworks are architected.
Generalizability. MOC is designed as a plug-in module compatible with any DAG-based MAS topology, demonstrated across random DAGs and task-adaptive topologies. This modularity enhances its practical adoption potential.
Limitations of Impact. The reliance on a separate distillation model for merging adds infrastructure complexity. The fixed K=2 as the "robust default" raises questions about adaptivity. The paper acknowledges that adaptive K selection is future work, but this seems critical for real deployment. Furthermore, the improvements on state-of-the-art models (DeepSeek-V3.2) are quite small, suggesting diminishing returns as base models become stronger.
4. Timeliness & Relevance
The paper addresses a timely topic. LLM-based MAS have proliferated rapidly (CAMEL, AutoGen, MetaGPT, etc.), and the community has indeed focused primarily on topology optimization while treating communication as a solved problem. The observation that naive concatenation of neighbor outputs loses multi-hop information is valid and practically relevant. The growing interest in agentic AI systems makes communication scheme optimization an increasingly important research direction.
5. Strengths & Limitations
Key Strengths:
Notable Weaknesses:
Summary
MOC makes a reasonable contribution by formalizing and improving the communication scheme in LLM-based MAS, an underexplored dimension alongside topology design. The multi-order evidence exposure idea is intuitive and well-motivated, and the consolidation strategy is practical. However, the empirical improvements are often modest and statistical rigor is lacking, the computational overhead of the merging process is non-trivial, and the fixed communication order limits adaptivity. The paper is a solid incremental contribution to the MAS communication literature but falls short of being transformative.
Generated Jun 2, 2026
Comparison History (19)
AbaqusAgent addresses a concrete, high-impact application by automating finite element analysis through multi-agent LLMs, directly lowering barriers in computational mechanics education and engineering practice. It validates on 50 real problems with 86% success rate, demonstrating practical utility. While Paper 1 (MOC) makes a solid contribution to multi-agent communication theory with broad applicability, Paper 2 opens a new paradigm for human-simulation interaction in engineering, has clearer real-world applications, and bridges AI with an established domain (FEA/solid mechanics), giving it broader interdisciplinary impact and immediate practical relevance.
Paper 1 addresses a fundamental bottleneck in LLM-based Multi-Agent Systems (communication efficiency and multi-hop dependencies). Its proposed scheme has broad applicability across numerous domains utilizing multi-agent setups. While Paper 2 offers an innovative neuro-symbolic approach, its direct impact is currently confined to geometry problem solving. Thus, Paper 1 demonstrates greater breadth of impact, generalizability, and potential for widespread adoption in a rapidly expanding field.
Paper 1 addresses a fundamental and pervasive challenge in modern LLMs: following complex, multi-constraint instructions. By formalizing this as a Constraint Adherence Problem and utilizing a novel knowledge graph-based bridging method, it offers a highly innovative solution to a widespread limitation. While Paper 2 presents strong methodological advancements for multi-agent systems, Paper 1's focus on core reasoning and instruction-following capabilities promises broader immediate applicability and impact across almost all domains utilizing large language models.
Paper 2 addresses a fundamental bottleneck in LLM-based multi-agent systems (communication efficiency and multi-hop dependencies), offering a generalizable solution applicable across numerous domains. Paper 1, while providing a rigorous and valuable benchmark, is highly specialized to financial reasoning. The broad applicability and foundational nature of Paper 2's methodological advancements give it a higher potential for widespread scientific impact across the AI research community.
Paper 2 introduces a novel framework for analyzing reasoning structure in LLMs by converting unstructured traces into verifiable reasoning graphs, defining new efficiency metrics that reveal insights hidden by standard accuracy/token metrics. This addresses a fundamental gap in LLM evaluation methodology with broad applicability across the field. Paper 1, while solid, is more incremental—improving multi-agent communication via multi-hop message passing is a narrower contribution. Paper 2's structural analysis tools have potential to become widely adopted diagnostic instruments for the rapidly growing reasoning model ecosystem.
Paper 1 introduces a novel, modality-aligned supervision mechanism (Imaginative Perception Tokens) targeting a well-known weakness of VLMs: spatial reasoning under partial observability. It contributes new task formulations plus ~20K examples with explicit intermediate “imagination” ground truth, enabling reproducible study and likely follow-on benchmarks. The approach is broadly applicable to robotics, embodied AI, AR/VR, and navigation, and its finding that text CoT can harm spatial tasks is timely and influential. Paper 2 is useful for multi-agent LLM systems, but is closer to an engineering refinement of messaging schemes with narrower cross-field impact.
EVA-Net introduces a novel cross-modal framework using video as semantic priors for EEG decoding, addressing a critical challenge in BCI systems (subject-independent generalization). This tackles a fundamental bottleneck in neural engineering with clear real-world applications (assistive devices, neurorehabilitation). The approach is methodologically innovative—using video instead of text for dynamic motor process alignment—and demonstrates significant performance gains (8.66% LOSO accuracy improvement). Paper 2 proposes an incremental communication optimization for LLM multi-agent systems, which is useful but more narrow and builds on a rapidly-shifting landscape where architectural innovations quickly become obsolete.
Paper 1 addresses a critical and widespread bottleneck in Retrieval-Augmented Generation (RAG)—handling semi-structured data—which has massive real-world applications in enterprise and e-commerce. By introducing a novel dual-view framework combining symbolic and semantic retrieval alongside a new benchmark dataset, it offers higher immediate utility and broader applicability across industries compared to the communication optimization in multi-agent systems presented in Paper 2.
Paper 1 addresses a fundamental limitation in the core reasoning capabilities of frontier large language models, revealing a critical 'production-evaluation gap' and confirmation bias. This insight has profound implications for AI safety, alignment, and future training paradigms (like RLHF). Paper 2, while offering a solid methodological improvement for multi-agent communication efficiency, has a narrower scope and application. Paper 1's findings challenge dominant assumptions about LLM reasoning, giving it broader relevance and higher potential for widespread scientific impact across the AI community.
Paper 2 likely has higher impact: it identifies a general, underappreciated failure mode (misalignment between confidence-based decoding/training and reasoning trajectories) in masked diffusion language models, with clear evidence across multiple reasoning tasks and an actionable takeaway (random masking is more robust for the challenging tail). This has broad implications for inference policies, training objectives, and evaluation of non-autoregressive LMs, and is timely as diffusion/MDM approaches gain adoption. Paper 1 is useful and engineering-relevant, but more incremental within multi-agent LLM communication design.
Paper 2 (MOC) likely has higher impact: it targets a broadly relevant, fast-growing area (LLM multi-agent systems) and introduces a general communication paradigm (multi-order evidence + semantic-topological merging) applicable across tasks, datasets, and backbones, with direct implications for scalability, efficiency, and robustness of agentic systems. Paper 1 (LFQ) is valuable but more specialized—an incremental PTQ objective/coverage improvement focused on final-block/logit alignment for generation quality—primarily impacting deployment efficiency rather than expanding system capabilities across fields.
Paper 1 introduces a generally applicable communication/aggregation scheme for LLM multi-agent systems (multi-hop evidence + constrained merging), likely reusable across many agent architectures and tasks, with broad downstream impact in AI systems and coordination. It reports extensive experiments across multiple datasets and model scales and offers code, strengthening rigor and adoption. Paper 2 is timely and valuable for a specific biocuration bottleneck and provides a strong benchmark-based evaluation, but its core contribution is primarily an application/assessment of existing frontier LLMs in one domain, with impact more localized and potentially sensitive to rapid model updates.
SAAS addresses a highly practical and timely problem—over-search in agentic LLM systems—that directly impacts inference costs and latency at scale. Its RL-based framework for cultivating self-awareness in agents is more novel and broadly applicable, as agentic search is a rapidly growing paradigm. The three-component design (boundary modeling, boundary-aware rewards, stage-wise optimization) offers a principled methodology. While MOC makes solid contributions to multi-agent communication, it addresses a more incremental improvement in message passing topology. SAAS's focus on computational efficiency and self-regulation aligns with urgent industry needs, giving it broader real-world impact potential.
Paper 1 likely has higher scientific impact due to its novelty in addressing an underexplored bottleneck (message transmission/optimization) in LLM multi-agent systems, a rapidly growing and broadly applicable area. Its method (multi-hop structured evidence + semantic-topological merging under token constraints) is general-purpose, evaluated across multiple datasets and LLM scales, and accompanied by code—supporting rigor and adoption. Paper 2 is methodologically solid and impactful for energy scheduling under uncertainty, but its evaluation appears more domain-specific and narrower in cross-field reach compared to advances in LLM-agent communication.
Paper 2 (MOC) addresses a fundamental and generalizable problem in LLM-based multi-agent systems—how agents communicate effectively—which has broad applicability across many domains. Its multi-order communication scheme with semantic-topological merging is a novel contribution that can improve any multi-agent system. Paper 1 (TravelEval), while thorough, is a domain-specific benchmark for travel planning with narrower impact. MOC's cross-domain applicability, demonstrated improvements across six datasets and multiple LLM scales, and its foundational contribution to multi-agent communication give it higher potential for broad scientific impact.
While Paper 2 presents a highly innovative neuro-symbolic approach for physics diagrams, Paper 1 addresses a fundamental bottleneck in LLM-based multi-agent systems. By optimizing multi-hop message transmission and merging, Paper 1 offers a domain-agnostic, scalable improvement. Because multi-agent LLM architectures are being adopted across virtually all AI applications, overcoming their communication inefficiencies yields a much broader scientific and practical impact compared to the niche, domain-specific generation tasks tackled in Paper 2.
Paper 2 likely has higher impact: it targets a core blocker for real-world LLM deployment—reliability of tool-augmented agents—introducing a self-healing, budgeted control framework with verification and observability. Its controlled fault-injection benchmark and strong gains over common baselines suggest methodological rigor and actionable engineering relevance across many domains using tool-calling agents (enterprise workflows, retrieval, automation). Paper 1 is novel for multi-agent communication efficiency, but its impact may be narrower to multi-agent coordination settings and more sensitive to benchmark/task design. Paper 2 is more timely and broadly applicable.
Paper 2 (MOC) addresses a broader, more generalizable problem in multi-agent LLM systems—inter-agent communication optimization—with a novel framework applicable across diverse tasks and model scales. It demonstrates consistent improvements across six datasets with available code. Paper 1, while methodologically sound and showing strong results, addresses a narrower benchmark-specific problem (memory conflict resolution via deterministic serial comparison) with findings that are somewhat intuitive (deterministic rules beat LLM judgment for simple versioning). Paper 2's multi-order communication scheme has wider applicability and relevance to the growing multi-agent systems field.
Paper 1 has higher impact potential due to a more novel, broadly applicable evaluation/diagnostic framework (shared decision landscapes) that can reinterpret existing trajectory datasets across benchmarks and yields actionable failure-region interventions. It contributes methodological tools (graph construction, trap/core overlay, event vocabulary) plus a demonstrated improvement pipeline on a high-profile, timely benchmark (SWE-bench) with measurable gains. Paper 2 is a solid systems/algorithm contribution for multi-agent LLM communication, but multi-hop/message-merging schemes are closer to incremental advances and its impact is narrower to MAS communication settings.