Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems
Ye Yu, Heming Liu, Haibo Jin, Xiaopeng Yuan, Peng Kuang, Haohan Wang
Abstract
Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations such as key-value caches offers a promising alternative to text-based protocols, but existing approaches do not jointly optimize communication with multi-agent reasoning. Therefore we propose DiffMAS, a training framework that treats latent communication as a learnable component of multi-agent systems. DiffMAS performs parameter-efficient supervised training over multi-agent latent trajectories, enabling agents to jointly learn how information should be encoded and interpreted across interactions. Experiments on mathematical reasoning, scientific QA, code generation, and commonsense benchmarks show that DiffMAS consistently improves reasoning accuracy and decoding stability over single-agent inference, text-based multi-agent systems, and prior latent communication methods, achieving 26.7% on AIME24, 20.2% on GPQA-Diamond, and consistent gains across reasoning benchmarks.
AI Impact Assessments
(3 models)Scientific Impact Assessment: DiffMAS – Learning to Communicate in Multi-Agent Language Systems
1. Core Contribution
DiffMAS proposes treating inter-agent communication in LLM-based multi-agent systems (MAS) as a learnable, differentiable component rather than a fixed interface. The key idea is to use KV (key-value) caches as a continuous latent communication medium between sequential agents, then apply parameter-efficient supervised fine-tuning (LoRA) over multi-agent latent trajectories so that the system jointly learns how to encode and interpret information across agent boundaries. The framework operates in two stages: upstream agents construct a shared KV trace without gradient updates, and then the final agent performs autoregressive decoding with SFT, backpropagating through the accumulated latent trace.
The core novelty lies at the intersection of latent reasoning and multi-agent optimization. While prior work has explored either training-free latent communication (e.g., sharing KV caches directly) or learned communication modules (e.g., Cache-to-Cache), DiffMAS specifically optimizes the communication interface end-to-end with the downstream reasoning task. The formalization of multi-agent communication as a composition of differentiable stage operators with non-overwriting trace concatenation is a clean abstraction.
2. Methodological Rigor
Theoretical grounding: Proposition 3.1 provides an interface-level analysis showing that concatenative (non-overwriting) communication avoids depth-dependent gradient attenuation compared to overwriting communication. This is a relatively straightforward observation—concatenation preserves direct gradient paths by construction—but it is correctly scoped as an interface-level guarantee rather than an end-to-end claim. The authors appropriately acknowledge that attention weights in the decoder can still introduce attenuation.
Experimental design: The evaluation spans five model scales (4B to 32B), six benchmarks across math, science, code, and commonsense reasoning, and four baselines (single-agent, TextMAS, LatentMAS, C2C). This breadth is commendable. However, several methodological concerns arise:
3. Potential Impact
The paper addresses a genuine gap: most MAS research treats communication as a fixed protocol, and making it learnable is a natural and important direction. The practical implications include:
However, the impact may be limited by several factors: the approach requires all agents to share the same model architecture (for KV cache compatibility), it's currently restricted to sequential pipelines, and the training only updates the final agent's parameters. Extending to heterogeneous agents, non-sequential topologies, or full multi-agent gradient propagation would significantly increase impact.
4. Timeliness & Relevance
This work is highly timely. Multi-agent LLM systems are rapidly gaining adoption (AutoGen, MetaGPT, ChatDev), and the field is transitioning from prompt engineering to systematic optimization. The observation that communication itself should be optimized—not just agent capabilities—fills an important conceptual gap. The concurrent emergence of latent reasoning research (Quiet-STaR, COCONUT) makes this a natural convergence point.
5. Strengths & Limitations
Strengths:
Limitations:
Overall Assessment
DiffMAS presents a well-motivated and cleanly executed contribution to an important emerging problem. The core idea of making latent communication learnable is sound, and the empirical results are promising across a good range of benchmarks. The paper's main limitation is that its "end-to-end" claims somewhat overstate the actual optimization scope (only final-agent LoRA is trained). The work opens interesting directions for fully differentiable multi-agent systems but represents an incremental step rather than a paradigm shift. The small evaluation sets and limited statistical analysis somewhat weaken confidence in the reported gains.
Generated Apr 24, 2026
Comparison History (43)
Paper 1 establishes fundamental mathematical theorems linking AI architectural constraints to human cognitive biases, bridging computer science, psychology, and cognitive science. Its theoretical depth, impossibility theorems, and cross-disciplinary validation offer broader foundational impact than Paper 2, which presents a valuable but narrower methodological improvement in multi-agent LLM systems.
Paper 1 (FATE) addresses a critical and timely problem—safety alignment for tool-using LLM agents—with a novel framework that tackles trajectory-level failures rather than just response-level issues. Its Pareto-front optimization for balancing safety and utility is methodologically innovative. The problem has immediate real-world implications as agents are deployed in high-stakes settings. Paper 2 (DiffMAS) proposes interesting latent communication optimization for multi-agent systems, but addresses a less urgent problem with more incremental gains. FATE's broader safety implications, novel self-evolution approach without expert demonstrations, and strong empirical results across multiple safety benchmarks give it higher potential impact.
Paper 1 likely has higher impact due to a broadly applicable, novel training framework (DiffMAS) that makes inter-agent communication itself learnable via latent trajectories, potentially affecting many multi-agent LLM settings beyond any single environment. It reports concrete, strong benchmark gains on widely watched reasoning tasks (e.g., AIME24, GPQA), suggesting timely relevance and easier downstream adoption. Paper 2 (GLANCE) is innovative for VLM exploration and intrinsic motivation, but its impact is more domain-specific to partially observable RL/embodied tasks and may depend more on environment design and RL stability.
Paper 2 is likely higher impact due to a more novel, generalizable methodological contribution: end-to-end learnable latent inter-agent communication (DiffMAS) with parameter-efficient training over multi-agent trajectories. This directly advances the core bottleneck of multi-agent LLM systems—communication bandwidth and coordination—and is broadly applicable across tasks and model families, with strong benchmark evidence on diverse, timely evaluations (math, QA, code, commonsense). Paper 1 offers an insightful governance-inspired design space and useful empirical comparisons, but it is largely an architecture selection/benchmarking framework rather than a new trainable mechanism, potentially limiting transformative impact.
Paper 1 introduces a novel, fundamental methodological advancement by treating multi-agent latent communication as a learnable component, shifting away from fixed text-based protocols. This has broad implications for improving complex reasoning capabilities across diverse AI domains. In contrast, Paper 2 presents an important but narrower benchmarking study on safety for a specific application (robotic health attendants). The foundational architectural innovation and strong empirical results on difficult benchmarks (AIME, GPQA) in Paper 1 give it a higher potential for widespread scientific impact across the broader AI research community.
Paper 2 introduces DiffMAS, a novel training framework that addresses a fundamental gap in multi-agent LLM systems by making inter-agent communication a learnable, jointly optimized component via latent representations. This is a broadly applicable methodological contribution with clear benchmarks across multiple domains. Paper 1, while impressive in its real-world deployment scale and engineering rigor, is more of a systems/engineering case study specific to onchain trading agents. Paper 2's contribution to differentiable multi-agent communication has broader theoretical and practical implications across the AI research community.
Paper 1 offers an unprecedented, large-scale empirical study of autonomous agents deploying real capital. Its focus on system-level reliability and the 'operating layer' exposes critical real-world failure modes absent in standard text benchmarks. This real-world grounding and massive scale ($20M volume) provide highly actionable insights for safe agent deployment, giving it greater potential impact than Paper 2's methodological improvements on standard reasoning benchmarks.
Paper 1 addresses a fundamental challenge in multi-agent LLM systems—jointly optimizing inter-agent communication—with broad applicability across reasoning tasks. The DiffMAS framework introduces a novel paradigm (learnable latent communication) with strong empirical results across multiple benchmarks. Given the explosive growth of LLM-based multi-agent systems, this work is highly timely and could influence a large research community. Paper 2, while valuable for regulatory compliance in clinical AI, addresses a narrower niche (epistemological validation in clinical DSLs) with a single demonstration domain, limiting its breadth of impact.
Paper 2 has higher likely impact due to a more broadly applicable, end-to-end learning framework (DiffMAS) that upgrades a core bottleneck in multi-agent LLM systems—communication—via jointly optimized latent protocols. It reports strong, benchmarked gains across multiple high-salience tasks (math, scientific QA, code, commonsense), suggesting immediate relevance and adoption potential across AI subfields. Paper 1 is novel in modeling prompt adaptation with a POMDP and cognitive states, but appears more domain- and system-specific (task planning explanations) and proof-of-concept, with narrower breadth despite clear real-world utility in human-centered AI.
Paper 2 addresses a timely and broadly impactful problem—optimizing communication in LLM-based multi-agent systems—which sits at the intersection of rapidly growing fields (LLMs, multi-agent AI, emergent communication). Its end-to-end differentiable framework for latent communication is novel and has wide applicability across reasoning, QA, and code generation. Paper 1 makes solid contributions to graph mining with strong theoretical grounding, but addresses a more niche problem. Paper 2's relevance to the current AI landscape and potential to influence how multi-agent LLM systems are designed gives it higher expected impact.
Paper 1 (DiffMAS) introduces a novel framework for jointly optimizing latent communication in multi-agent LLM systems, addressing a fundamental gap in how agents share information. It demonstrates consistent improvements across diverse benchmarks and has broad applicability to the rapidly growing field of multi-agent AI systems. Paper 2 makes a valid but relatively narrow observation about clock skew in distributed inference observability—an important systems engineering concern but with limited novelty and narrower impact scope compared to the paradigm-shifting potential of learnable inter-agent communication.
Paper 2 (DiffMAS) introduces a more fundamentally novel concept—jointly optimizing latent communication in multi-agent LLM systems—which opens a new research direction at the intersection of multi-agent systems and representation learning. While Paper 1 (SAT) addresses the important but incremental problem of reasoning efficiency with a well-engineered FSM-based pruning framework, Paper 2 challenges a core assumption (fixed text-based communication) and proposes end-to-end differentiable multi-agent training. This has broader implications for how multi-agent AI systems are designed and trained, potentially impacting diverse fields beyond reasoning efficiency.
Paper 1 introduces DiffMAS, a novel framework for jointly optimizing latent communication in multi-agent LLM systems—a largely unexplored direction with broad applicability across reasoning tasks. It demonstrates consistent empirical improvements across multiple benchmarks, suggesting strong methodological contribution. Paper 2 provides valuable critique of LLM-as-judge methodology for disinformation evaluation, but is more narrowly scoped as an audit study. Paper 1's technical innovation in differentiable multi-agent communication has greater potential to influence future research directions in AI systems design and multi-agent reasoning.
Paper 2 introduces a fundamental methodological innovation by shifting multi-agent communication from fixed text interfaces to learnable latent representations. End-to-end optimization of these systems addresses a major bottleneck in current multi-agent architectures, and the strong empirical gains on rigorous benchmarks (AIME24, GPQA) suggest broad applicability. While Paper 1 provides valuable critical insights into LLM evaluation, Paper 2 offers a foundational advancement in AI capability and system design with wider transformative potential across the field.
Paper 1 is more scientifically novel and broadly impactful: it introduces an end-to-end trainable latent communication framework for multi-agent LLMs and validates across diverse, standard reasoning benchmarks with measurable accuracy gains. This advances core methodology (learning communication protocols) likely relevant to multi-agent learning, representation learning, and efficient inference. Paper 2 targets an important engineering bottleneck (tool/schema token overhead) with a practical middleware design, but its evaluation is largely simulated and relies on projected downstream metrics rather than demonstrated end-to-end agent improvements, reducing methodological rigor and general scientific contribution.
Paper 2 (DiffMAS) introduces a novel, concrete technical framework for optimizing latent communication in multi-agent LLM systems, addressing a clear gap in the field. It demonstrates strong empirical results across multiple benchmarks and opens a new research direction (differentiable multi-agent communication). Paper 1 provides valuable methodological contributions for AI safety evaluation but is more incremental and observational in nature, with findings that are somewhat inconclusive (approximately equal contributions, no clear trends). Paper 2's approach has broader applicability across NLP tasks and is more likely to inspire follow-up work in multi-agent systems.
Paper 1 introduces DiffMAS, a novel framework for jointly optimizing latent communication in multi-agent LLM systems—a largely unexplored direction with broad applicability across reasoning tasks. It demonstrates consistent empirical gains on multiple benchmarks, addressing a fundamental architectural limitation. Paper 2 makes valuable methodological contributions to AI safety evaluation but is more incremental and narrower in scope, primarily offering observational analysis rather than a new capability. Paper 1's potential to reshape how multi-agent LLM systems are designed and trained gives it broader and more immediate scientific impact.
Paper 1 introduces a paradigm-shifting approach by replacing fixed text-based communication in multi-agent LLMs with learnable latent representations (DiffMAS). This foundational algorithmic breakthrough has broad theoretical and empirical implications for AI architecture, demonstrating significant performance gains. While Paper 2 offers a crucial methodological tool for mitigating bias in applied LLM analysis, Paper 1 represents a more fundamental advancement in AI capabilities and system design, likely leading to broader technological impact.
Paper 1 addresses a fundamental epistemological bottleneck in AI-assisted research: distinguishing between data-driven inference and parametric memory bias. By introducing 'epistemic blinding,' it provides a broadly applicable, methodology-improving protocol that enhances auditability and trust across fields like biology and finance. While Paper 2 offers a strong architectural advancement for multi-agent systems, Paper 1 has higher cross-disciplinary impact by directly safeguarding the scientific method and reproducibility in the rapidly growing domain of LLM-assisted analysis.
Paper 1 introduces a conceptual paradigm shift in AI safety through unsupervised monitoring, addressing the critical limitation of relying on predefined rules for detecting novel misbehaviors. Its practical demonstration of discovering a previously unknown benchmark vulnerability highlights its immediate real-world utility and broad implications for AI evaluation and safety, giving it a broader potential impact than the architectural optimizations in Paper 2.