What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
Chen Huang, Yuhao Wu, Wenxuan Zhang
Abstract
Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system performance and inference cost. We analyze five common inter-agent communication strategies across two MAS topologies, finding that no fixed strategy is universally optimal. Instead, effective inter-agent messages consistently preserve action-centered information needed by downstream agents. Building on this, we propose the PACT (Protocolized Action-state Communication and Transmission), which treats inter-agent communication as a public state-update problem and projects each raw agent output into a compact action-state record before it enters shared history. Across different MAS topologies, PACT consistently improves the performance-cost trade-off, achieving comparable or stronger task performance with substantially fewer tokens. The gains extend to production coding harnesses: PACT lifts OpenHands' resolve rate at -10% tokens-per-resolved, and is resolve-neutral on SWE-agent while halving input tokens. Our code is publicly available at https://github.com/iNLP-Lab/PACT.
AI Impact Assessments
(1 models)Scientific Impact Assessment
Core Contribution
This paper tackles a practical but underexplored problem in LLM-based multi-agent systems (MAS): what content should agents communicate to each other? While prior MAS research has focused on agent roles, topologies, and turn-taking, the actual message content has been left as unconstrained natural language. The paper makes two contributions: (1) a diagnostic analysis of five common inter-agent communication strategies across two MAS topologies, revealing that no single strategy is universally optimal but that action-centered information is consistently valuable; and (2) PACT, a training-free communication protocol that projects raw agent outputs into compact three-field records (ACTION, STATE, RESULT) before they enter shared history.
The conceptual framing — treating inter-agent communication as a "public state-update problem" and drawing a boundary between private computation and public communication — is clean and intuitive. The idea itself is not deeply novel (structured message passing is well-established in distributed systems), but its application to LLM-based MAS is timely and well-motivated.
Methodological Rigor
Diagnostic analysis. The five-strategy comparison across two topologies and three model scales (Qwen3-8B/14B/32B) is systematic and informative. The analysis credibly establishes that full-content forwarding is wasteful, generic shortening is unreliable, and artifact-only messages identify the right content type but lack protocol structure. The findings are intuitive but empirically grounded.
Experimental evaluation. The comparison against CoA, TextMAS, and Multi-Agent Debate baselines is reasonable, though the baseline selection could be broader. The results consistently show PACT achieving comparable or better performance at substantially lower token cost (38.7% average reduction). The ablation study in Table 3 demonstrates that all three PACT fields contribute, with the largest degradation when both ACTION and STATE are removed.
Limitations in rigor:
Potential Impact
The practical relevance is notable. Token cost is a genuine bottleneck in production MAS deployments, and the paper demonstrates PACT on real-world coding harnesses (OpenHands and SWE-agent on SWE-bench Verified). The OpenHands result (+3.6 pp resolve rate at -10.3% tokens-per-resolved) is compelling, though the SWE-agent result (−1.4 pp resolve rate with −50.4% input tokens) is more of an efficiency gain than a performance improvement.
The protocol is lightweight (implemented as a proxy hook requiring no model training or architecture changes), which lowers the adoption barrier. However, the impact may be bounded by several factors:
Timeliness & Relevance
The paper is highly timely. With the explosion of agentic AI systems (Claude Code, Codex, etc.) and reasoning models that produce verbose outputs, token cost management is a pressing concern. The observation that reasoning traces compound across multi-turn histories is particularly relevant given the trend toward extended thinking models. The paper addresses a real engineering bottleneck that practitioners face daily.
Strengths
1. Clear problem framing: The paper precisely identifies the gap — inter-agent message content is an underexplored design dimension — and builds a coherent argument from analysis to solution.
2. Practical applicability: PACT requires no training, no new agents, and no changes to existing agent architectures. The proxy hook implementation for production harnesses is elegant.
3. Comprehensive diagnostic: The five-strategy analysis provides genuine insight (not just a baseline comparison) and the findings about topology-dependent effectiveness are useful for the community.
4. Real-world validation: Testing on SWE-bench Verified with OpenHands and SWE-agent goes beyond toy benchmarks and demonstrates practical utility.
Limitations & Weaknesses
1. Limited model diversity: All experiments use Qwen3 only. The protocol's effectiveness might vary significantly with models that have different verbosity patterns or instruction-following capabilities.
2. Projection cost opacity: The computational cost of the PACT projection step (presumably an additional LLM call or prompt addition) is not isolated or discussed.
3. Narrow topology coverage: Only two MAS topologies are tested. More complex settings (dynamic routing, hierarchical teams, open-ended debate) are acknowledged as unexplored.
4. Modest performance gains: While token reductions are substantial, performance improvements are often marginal, making it unclear whether PACT helps agents reason better or simply maintains performance while being cheaper.
5. No comparison with compression baselines: The paper doesn't compare against prompt compression methods (e.g., LLMLingua) or memory-based approaches that could achieve similar token reductions.
6. Scalability analysis missing: How does PACT perform as the number of agents or interaction turns increases significantly?
Overall Assessment
This is a solid engineering contribution that addresses a real and timely problem with a clean, practical solution. The diagnostic analysis provides useful insights, and the PACT protocol is well-designed for adoption. However, the novelty is incremental — structured message protocols are a natural engineering step — and the evaluation, while competent, lacks the breadth and statistical rigor to fully establish generalizability. The paper would benefit from cross-model evaluation, projection cost analysis, and comparison with compression baselines.
Generated Jun 5, 2026
Comparison History (17)
Paper 1 explores fundamental aspects of AI reasoning, alignment blind spots, and epistemic behavior, offering profound insights into model cognition and safety. While Paper 2 provides a valuable engineering solution for cost and efficiency in multi-agent systems, Paper 1's findings on RLHF biases and cognitive personas have a broader, more transformative potential impact across AI safety, alignment, and cognitive modeling fields.
Paper 2 likely has higher impact: it introduces a new benchmark targeting an under-measured, timely capability (long-running monitoring with cost–responsiveness tradeoffs), enabling broad, comparable evaluation across agent designs, models, and web-agent harnesses. Benchmarks tend to catalyze follow-on work across academia and industry, with clear real-world applicability (notifications, ops, finance, scheduling). Paper 1 is a solid, practical communication/protocol contribution with demonstrated token/performance gains, but it is more specialized to MAS message design and may have narrower cross-field adoption than a widely usable benchmark.
Paper 2 addresses a critical, high-stakes problem in AI safety by benchmarking covert psychological manipulation in LLMs. This fills a significant gap in current safety evaluations and has profound implications for AI alignment, policy, and human-AI interaction, giving it a broader and more fundamental scientific impact compared to the practical efficiency optimizations for multi-agent systems in Paper 1.
Paper 1 addresses a highly practical and timely problem in multi-agent LLM systems—communication efficiency—with concrete, measurable improvements on production benchmarks (OpenHands, SWE-agent). Its PACT framework offers immediately actionable design principles for the rapidly growing MAS community, with public code and clear cost-performance trade-offs. Paper 2 presents a useful conceptual framework for knowledge infusion in generative models, but its contribution is more taxonomic/organizational, with narrower empirical validation (safety alignment in diffusion models). Paper 1's broader applicability across MAS topologies and direct relevance to production systems gives it higher potential impact.
Paper 2 addresses a broadly relevant problem in multi-agent LLM systems—efficient inter-agent communication—which is timely given the rapid growth of LLM-based multi-agent frameworks. It proposes a general, reusable protocol (PACT) with clear practical benefits (reduced token costs, maintained performance) demonstrated across multiple topologies and production systems (OpenHands, SWE-agent). Paper 1, while technically rigorous, targets a narrow industrial application (circular factory angle grinder reuse) combining known techniques (CNN-LSTM, S-N curves, Paris law) with limited generalizability beyond its specific domain.
Paper 2 addresses a critical and immediate bottleneck in multi-agent LLM systems: context window exhaustion and high token inference costs. By proposing a protocolized communication strategy (PACT) and demonstrating empirical cost-performance improvements on industry-standard production harnesses like SWE-agent and OpenHands, it offers highly practical and scalable value. Paper 1 presents a useful but more abstract evaluation methodology based on existing information theory concepts. Paper 2's direct optimization of system efficiency and proven results on state-of-the-art benchmarks give it broader applicability and higher potential for immediate scientific and industry impact.
Paper 1 proposes a foundational architectural shift by natively fusing task-specific perceptual models into a transformer decoder, addressing major inefficiencies in monolithic multimodal LLMs. Its broad applicability across vision, audio, and structured data, combined with state-of-the-art benchmark performance against next-generation models, suggests a wider scientific and industrial impact. While Paper 2 offers a valuable optimization protocol for multi-agent systems, Paper 1 represents a more significant leap in fundamental AI model design.
Paper 1 (AbaqusAgent) has higher potential scientific impact because it addresses a concrete, high-value problem in computational mechanics—automating FEA workflows via LLM agents—bridging AI and engineering simulation in a novel way. It demonstrates practical end-to-end capability across 50 validated problems with 86% success, directly enabling real-world applications in engineering design, education, and optimization. Paper 2 (PACT) addresses important but more incremental concerns about token efficiency in multi-agent communication. While useful, it optimizes existing MAS infrastructure rather than opening a new application domain, giving it narrower cross-disciplinary impact.
Paper 2 addresses a fundamental and broadly applicable problem in multi-agent LLM systems—efficient inter-agent communication—proposing a principled framework (PACT) that demonstrates concrete improvements across multiple topologies and production systems. Its impact spans the rapidly growing field of LLM-based multi-agent systems with immediate practical applications (reduced cost, improved performance). Paper 1, while timely and valuable for AI safety benchmarking in companion systems, is more niche in scope, serving primarily as a dataset contribution for a specific safety evaluation domain. Paper 2's methodological contribution has broader applicability and addresses a more fundamental architectural challenge.
Paper 2 addresses a critical and universal bottleneck in LLM-based multi-agent systems: communication efficiency and context window inflation. By proposing a generalizable protocol (PACT), its methodology can be widely adopted across diverse AI domains. In contrast, Paper 1, while methodologically strong, is highly domain-specific (hardware RTL synthesis), limiting its breadth of impact compared to foundational improvements in multi-agent architectures.
ToolSelf introduces a more fundamentally novel paradigm—unifying task execution and self-reconfiguration within a single agent's action space via tool abstraction—addressing a core limitation of LLM agents (static configurations). Its contribution spans architecture design, training methodology (CAT), and demonstrates substantial gains (28.8 points average). Paper 2 addresses inter-agent communication efficiency, which is practically valuable but more incremental, optimizing an existing aspect (token usage) rather than introducing a new capability. ToolSelf's broader conceptual innovation and potential to reshape how agents adapt at runtime gives it higher long-term impact.
Paper 2 is likely to have higher scientific impact because it addresses a broadly shared bottleneck—benchmark saturation and the high cost of creating new agent evaluations—via an automated, scalable task-generation pipeline. Its outputs (TASTE and τ^c-Bench) can become community infrastructure, influencing model development, evaluation practice, and comparisons across labs and domains, and it is timely as agent benchmarks rapidly saturate. Paper 1 is innovative and practically valuable for MAS efficiency, but its impact is more scoped to communication/compression protocols within specific multi-agent architectures.
DeltaMem addresses a fundamental problem in LLM agent memory—redundancy and retrieval conflicts—with a novel residual tree structure that elegantly handles incremental experience variations. The concept of residual experience memory with autonomous consolidation is more architecturally innovative and broadly applicable across agent systems. While PACT offers practical token-efficiency improvements for multi-agent communication, its contribution is more incremental (structured message formatting). DeltaMem's hierarchical memory organization with self-reorganization has deeper implications for continual learning in LLM agents, a rapidly growing research area.
Paper 1 addresses a fundamental bottleneck in multi-agent systems (token inflation and context limits) by proposing a novel, broadly applicable communication protocol (PACT). Its foundational nature and ability to improve efficiency across general AI agent systems give it a higher potential for widespread scientific impact and citations compared to Paper 2, which, despite its impressive real-world deployment, is highly domain-specific to mapping.
Paper 1 addresses a critical and universal bottleneck in the rapidly expanding field of LLM-based multi-agent systems: token consumption and context window limits. By introducing a novel communication protocol that significantly improves the performance-cost trade-off, its findings have immediate, widespread applicability across numerous AI domains. While Paper 2 presents a rigorous and valuable medical AI application, Paper 1 offers a foundational methodology with broader, cross-disciplinary impact in AI development.
Paper 2 presents a foundational infrastructure for distributed, heterogeneous multi-agent reinforcement learning at scale. While Paper 1 offers a highly practical solution for token efficiency in multi-agent communication, Paper 2's AgentJet framework fundamentally expands the capabilities of agentic RL research, enabling multi-model training, fault tolerance, and autonomous long-horizon research workflows. This breadth of application and methodological advancement gives Paper 2 a significantly higher potential impact on how future LLM agent research is conducted.
SkillPyramid addresses a more fundamental challenge in AI agent development—systematic skill construction, accumulation, and transfer for self-evolving agents. Its hierarchical skill consolidation framework with self-evolution mechanisms represents a more novel architectural contribution. The substantial improvements (38.0% reward increase, 27.7% fewer steps) across multiple benchmarks and four backbone models demonstrate broad applicability. While PACT makes a solid engineering contribution to communication efficiency in multi-agent systems, SkillPyramid's focus on enabling agents to continuously learn and generalize skills has broader implications for the long-term development of autonomous AI systems.