Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

Mingxi Zou, Zhihan Guo, Langzhang Liang, Zhuo Wang, Qifan Wang, Qingsong Wen, Irwin King, Lizhen Qu

May 11, 2026

arXiv:2605.10870v1 PDF

cs.AI(primary)

#169of 2292·Artificial Intelligence

#169 of 2292 · Artificial Intelligence

Tournament Score

1526±45

10501800

86%

Win Rate

Wins

Losses

Matches

Rating

7.8/ 10

Significance8

Rigor7.5

Novelty8

Clarity7.5

Tournament Score

1526±45

10501800

86%

Win Rate

Wins

Losses

Matches

Rating

7.8/ 10

Significance

Rigor

Novelty

Clarity

Abstract

Long-horizon language agents must operate under limited runtime memory, yet existing memory mechanisms often organize experience around descriptive criteria such as relevance, salience, or summary quality. For an agent, however, memory is valuable not because it faithfully describes the past, but because it preserves the distinctions between histories that must remain separated under a fixed budget to support good decisions. We cast this as a decision-centric rate-distortion problem, measuring memory quality by the loss in achievable decision quality induced by compression. This yields an exact forgetting boundary for what can be safely forgotten, and a memory-distortion frontier characterizing the optimal tradeoff between memory budget and decision quality. Motivated by this decision-centric view of memory, we propose DeMem, an online memory learner that refines its partition only when data certify that a shared state would induce decision conflict, and prove near-minimax regret guarantees. On both controlled synthetic diagnostics and long-horizon conversational benchmarks, DeMem yields consistent gains under the same runtime budget, supporting the principle that memory should preserve the distinctions that matter for decisions, not descriptions.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: "Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory"

1. Core Contribution

This paper reframes the agent memory problem from a descriptive compression task to a decision-centric compression problem. The key insight is that memory should preserve distinctions between histories that lead to different optimal actions, rather than preserving descriptive fidelity. The authors formalize this through three main contributions:

An exact forgetting boundary (Theorem 1): histories can share a memory state if and only if they admit a common ε-optimal action—a purely decision-theoretic criterion independent of descriptive similarity.

A memory-distortion frontier with covering/packing bounds (Theorem 2) characterizing the minimum number of memory states needed at any target distortion level.

DeMem, an online memory learner with K runtime slots that refines partitions only upon certified decision conflict, with near-minimax regret guarantees (Theorems 4-5).

The NP-completeness result (Theorem 3) for optimal memory partitioning further motivates the greedy online approach. The conceptual reframing—"remember the decision, not the description"—is both memorable and actionable.

2. Methodological Rigor

The theoretical development is thorough and well-structured. The progression from the K-state memory constraint → decision distortion → forgetting boundary → covering/packing bounds → computational hardness → online algorithm → regret guarantees follows a natural logical chain.

Strengths of the formal analysis:

The regret decomposition into compression error, statistical learning, and certification exploration (Theorem 4) is clean and interpretable, matching the minimax lower bound (Theorem 5) up to log factors in the statistical term.

The three-term approximation bridge (Proposition 5) connecting abstract theory to the slot-based implementation (ε*∞(K) + η_route + η_read) is practically useful and empirically validated (Appendix E.10 shows 81% of the predicted bound is realized).

Concerns:

The contextual bandit formulation with i.i.d. sampling over answer-time instances is a significant simplification. While acknowledged, the gap between this abstraction and actual sequential dialogue remains substantial.

The certified split criterion for LLM benchmarks (Appendix D.11) involves multiple design choices (safety multipliers, calibration slack, candidate sets) that are quite different from the clean theoretical certificates—the theory-to-practice bridge, while carefully documented, involves considerable engineering judgment.

The packing/covering bounds use a decision distance that violates triangle inequality, which limits their combinatorial usefulness beyond the stated results.

3. Potential Impact

Theoretical impact: The decision-centric rate-distortion framing offers a principled alternative to the prevalent descriptive-similarity paradigm in agent memory. The forgetting boundary and memory-distortion frontier provide vocabulary and formal tools that could influence how the community thinks about memory budgets. The connection to state abstraction in RL (bisimulation, homomorphisms) is well-drawn and could stimulate cross-pollination.

Practical impact: The empirical results are strong. On LoCoMo, DeMem achieves 91.1% (GPT-4o-mini) vs. 88.8% for the next best method (Mnemis). The modularity result (Appendix E.12) showing that decision-aware selection can be dropped into RAG (+8.8%) and EMem-G (+6.2%) as a component is particularly compelling for adoption. Results generalize across GPT-4o-mini, GPT-4.1-mini, and Llama-3.1-70B backbones.

The mismatch analysis (Appendix E.13) is convincing: description similarity has only ρ=0.103 Spearman correlation with evidence compatibility (AUC=0.548), and 85% of description-based retrieval failures trace to evidence miss or dilution—establishing the practical relevance of the theoretical concern.

4. Timeliness & Relevance

This paper addresses a genuine bottleneck. As agents tackle longer horizons (multi-session dialogue, agentic task completion), memory management becomes critical. Recent benchmarks (LoCoMo, LongMemEval, MemoryArena) consistently show that existing memory systems fail at long-term integration. The decision-centric perspective arrives at a time when the community is actively debating how to organize agent memory beyond simple retrieval augmentation.

The paper also connects to the broader "decision-centric AI" movement, providing concrete formal tools where prior work offered programmatic guidance.

5. Strengths & Limitations

Key strengths:

Unusually complete paper: theory (forgetting boundary, frontier, hardness) → algorithm (DeMem with guarantees) → extensive empirical validation (synthetic + 3 benchmarks + ablations + mechanism audits + human validation)

The synthetic environment cleanly isolates the description-decision mismatch mechanism, and Figure 4 directly validates the theoretical distortion object

Split precision of 85% with only 4.6% trigger rate demonstrates that certified refinement is selective and accurate

Human-judge agreement study (κ=0.79, 91.3% agreement) strengthens evaluation credibility

The approximation bridge decomposition (Proposition 5) + empirical validation provides an actionable diagnostic framework

Notable limitations:

The i.i.d. contextual bandit formulation may not capture sequential dependencies in dialogue; the block-stationary extension (Appendix C.7) is a partial remedy

K must be set as a hyperparameter; the theory characterizes optimal distortion given K but doesn't prescribe how to choose K adaptively

The MemoryArena results (Table 5) show improvements but absolute performance remains low across all methods—the decision-centric advantage is consistent but modest in this harder setting

The DeMem-Core variant (90.8 vs. 91.1 full) suggests that some gains may come from implementation details beyond the core theoretical mechanism

Scalability to very large context vocabularies relies on proposed but untested approximate methods (locality-sensitive hashing, etc.)

Additional Observations

The paper is exceptionally well-organized given its density—50+ pages of appendices with careful documentation of every design choice, ablation, and diagnostic. The alignment table (Appendix D.8) mapping theory to implementation is a model of transparency. The error attribution analysis (Table 14) closing the diagnostic chain from weak proxy → retrieval gap → downstream failure is particularly well-executed.

The work's broadest contribution may be conceptual: establishing that memory quality should be measured by decision preservation rather than information preservation, with formal backing for when this distinction matters.

Rating:7.8/ 10

Significance 8Rigor 7.5Novelty 8Clarity 7.5

Generated May 12, 2026

Comparison History (25)

vs. KISS - Knowledge Infrastructure for Scientific Simulation: A Scaffolding for Agentic Earth Science

claude-opus-4.65/19/2026

Paper 2 (KISS) has higher potential scientific impact due to its broader real-world applicability and interdisciplinary reach. It addresses a critical democratization challenge—making complex Earth science simulation models accessible to communities most affected by climate risk. The empirical validation across 119 knowledge infrastructures spanning 14 Earth-science domains demonstrates remarkable generalizability. While Paper 1 offers elegant theoretical contributions (rate-distortion framework for agent memory), its impact is more narrowly scoped to the AI/agent memory community. Paper 2's potential to transform how scientific simulation knowledge is shared and operationalized across diverse communities gives it substantially wider societal and scientific impact.

vs. Imperfect World Models are Exploitable

claude-opus-4.65/19/2026

Paper 2 addresses a fundamental theoretical question about the reliability of world models in RL, establishing formal connections between reward hacking and model exploitation with impossibility results and safe horizon bounds. This has broad implications for AI safety, model-based RL, and alignment research—all highly timely topics. While Paper 1 presents a solid contribution with a novel rate-distortion framework for agent memory, Paper 2's theoretical contributions are more foundational, applicable across a wider range of RL settings, and directly relevant to the critical AI safety discourse, giving it higher potential for cross-field impact and citations.

vs. Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

claude-opus-4.65/16/2026

Paper 2 introduces a novel theoretical framework (decision-centric rate-distortion) for agent memory that is more fundamentally innovative, with formal guarantees and broad applicability across all long-horizon agent settings. While Paper 1 (PyRAG) is a solid engineering contribution improving multi-hop RAG through code-based reasoning, it is more incremental within an already crowded RAG optimization space. Paper 2's theoretical grounding—exact forgetting boundaries, memory-distortion frontiers, and near-minimax regret guarantees—provides deeper foundational insights that could reshape how memory is designed across diverse AI agent architectures.

vs. Coding Agent Is Good As World Simulator

claude-opus-4.65/16/2026

Paper 1 introduces a novel theoretical framework (rate-distortion theory for agent memory) with formal guarantees, addressing a fundamental problem in long-horizon language agents. It provides both theoretical contributions (exact forgetting boundary, memory-distortion frontier, near-minimax regret guarantees) and practical algorithms (DeMem). This principled reconceptualization of memory from descriptive to decision-centric has broad implications across AI, cognitive science, and reinforcement learning. Paper 2, while practical and timely, is more incremental—combining existing components (LLM agents, physics simulators) in a pipeline without deep theoretical novelty.

vs. Self-ReSET: Learning to Self-Recover from Unsafe Reasoning Trajectories

gpt-5.25/16/2026

Paper 1 offers a more foundational, novel framing: decision-centric memory as a rate–distortion problem, yielding principled boundaries/frontiers and an online algorithm with near-minimax regret guarantees. This combination of theory + provable algorithmic contributions + applicability to long-horizon agents suggests broad impact across RL, information theory, and agent systems. Paper 2 is timely and practically valuable for LLM safety robustness, but is closer to an incremental alignment/RL recipe (on-policy recovery) with impact likely concentrated in safety engineering and dependent on empirical benchmarks, with less general theoretical advance.

vs. Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

gemini-3.15/16/2026

Paper 1 introduces a fundamental, theoretically grounded framework (rate-distortion) for a critical bottleneck in AI: long-horizon agent memory. Its formulation offers a principled shift from descriptive to decision-centric memory, backed by near-minimax regret guarantees. While Paper 2 provides valuable empirical insights into evaluation biases of LLM-as-a-judge, Paper 1's foundational approach to agent architecture is likely to spur broader methodological innovations and long-term advancements across reinforcement learning and language agents.

vs. Attributing Emergence in Million-Agent Systems

gpt-5.25/16/2026

Paper 2 has higher estimated impact: it introduces a broadly applicable, principled rate–distortion formulation of agent memory tied directly to decision quality, plus an online algorithm (DeMem) with near-minimax regret guarantees and demonstrated benchmark gains. This combines novelty with methodological rigor and clear real-world relevance to long-horizon LLM agents across many domains. Paper 1 is strong and timely for large-scale multi-agent social simulation and attribution, but its impact is narrower (specific to MAS attribution and nonlinear macro indicators) and depends more on access to million-agent settings/data, limiting breadth of adoption.

vs. Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents

gemini-3.15/12/2026

Paper 1 introduces a fundamental, theoretically grounded rate-distortion framework for agent memory with proven near-minimax regret guarantees. This decision-centric paradigm represents a significant theoretical and architectural shift with broad applicability across LLMs and reinforcement learning. Paper 2 offers a valuable but more narrowly focused evaluation framework for measuring terminal commitment in embodied agents, making Paper 1's foundational contribution likely to have a wider and deeper scientific impact.

vs. Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

claude-opus-4.65/12/2026

Paper 1 introduces a novel theoretical framework (decision-centric rate-distortion) for agent memory that provides fundamental insights—exact forgetting boundaries, memory-distortion frontiers, and near-minimax regret guarantees. This principled reformulation of memory for language agents has broad applicability across agent architectures and decision-making domains. Paper 2, while valuable as a benchmark, is more incremental—it extends evaluation methodology to workspace tasks. Benchmarks have impact but are more easily superseded, whereas Paper 1's theoretical contributions and the DeMem algorithm offer lasting conceptual and practical advances for the growing field of long-horizon agents.

vs. A canonical generalization of OBDD

gemini-3.15/12/2026

Paper 1 addresses the highly timely area of long-horizon language agents. By grounding agent memory in rate-distortion theory rather than heuristics, it offers a rigorous, novel, and broadly applicable framework. Its combination of theoretical guarantees and empirical improvements gives it a higher potential for widespread adoption in modern AI compared to Paper 2, which offers a valuable but more niche theoretical advancement in Boolean function representations and formal logic.

vs. HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

claude-opus-4.65/12/2026

Paper 1 offers a more foundational contribution by recasting agent memory as a decision-centric rate-distortion problem, providing theoretical guarantees (exact forgetting boundary, memory-distortion frontier, near-minimax regret bounds) alongside practical algorithms. This principled framework has broader impact potential across memory systems, information theory, and agent design. Paper 2 presents a solid engineering contribution with weighted graph traversal and RL-based optimization, but is more incremental in nature—combining existing techniques (GNNs, RL, vector search) without the same theoretical depth or conceptual novelty that could reshape how the field thinks about agent memory.

vs. The Metacognitive Probe: Five Behavioural Calibration Diagnostics for LLMs

gpt-5.25/12/2026

Paper 2 is more impactful due to a clearer conceptual reframing (decision-centric memory as rate–distortion), stronger methodological rigor (formal forgetting boundary, memory–distortion frontier, near-minimax regret guarantees), and broader applicability across long-horizon agent design, RL, compression, and systems. Its contributions are likely to generalize beyond specific LLMs and directly influence practical memory architectures under runtime constraints. Paper 1 provides useful diagnostics for calibration pockets, but is explicitly exploratory, partly falsifies its human hypothesis, and its impact is narrower (evaluation/benchmarking) with less theoretical grounding.

vs. SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

gemini-3.15/12/2026

Paper 2 offers a foundational, mathematically grounded approach to agent memory by framing it as a rate-distortion problem. Its rigorous theoretical contributions, including near-minimax regret guarantees and a shift from descriptive to decision-centric memory, provide a highly generalizable framework. While Paper 1 presents a highly practical systems-level innovation for LLM agents, Paper 2's deep methodological rigor and potential to influence broader fields like reinforcement learning, cognitive modeling, and general AI give it a higher potential for lasting scientific impact.

vs. Alignment as Jurisprudence

gpt-5.25/12/2026

Paper 1 offers a novel, formal decision-centric rate–distortion framework for agent memory, derives concrete optimality frontiers, and proposes an online algorithm (DeMem) with near-minimax regret guarantees plus empirical validation. This combination of theoretical rigor and actionable methodology is likely to influence work on long-horizon agents, memory systems, and RL/LLM architectures, with clear real-world applicability under runtime constraints. Paper 2 is timely and potentially broad but is primarily conceptual/interpretive with less methodological or empirical grounding, making its direct scientific/technical impact less predictable.

vs. E-TCAV: Formalizing Penultimate Proxies for Efficient Concept Based Interpretability

gemini-3.15/12/2026

Paper 2 introduces a strong conceptual shift in agent memory from descriptive to decision-centric criteria, grounded in a rigorous rate-distortion framework with theoretical guarantees. Its application to long-horizon language agents addresses a highly relevant and rapidly growing field. While Paper 1 provides a valuable optimization for an existing interpretability method (TCAV), Paper 2 offers a more fundamental methodological innovation with broader implications for AI agent design and reinforcement learning.

vs. ActivityEditor: Learning to Synthesize Physically Valid Human Mobility

claude-opus-4.65/12/2026

Paper 1 introduces a fundamentally novel theoretical framework (decision-centric rate-distortion) for agent memory that challenges existing paradigms, provides rigorous mathematical foundations including exact forgetting boundaries and near-minimax regret guarantees, and addresses a core challenge in long-horizon language agents. Its breadth of impact spans information theory, reinforcement learning, and LLM agents. Paper 2, while practically useful for urban mobility simulation, is more application-specific and incremental in its dual-LLM-agent approach. Paper 1's theoretical contributions have broader potential to reshape how memory is conceptualized across AI systems.

vs. Learning the Interaction Prior for Protein-Protein Interaction Prediction: A Model-Agnostic Approach

gemini-3.15/12/2026

Paper 2 proposes a fundamental, theoretical framework for agent memory based on rate-distortion theory, offering broad applications across artificial intelligence, reinforcement learning, and language agents. Its rigorous mathematical grounding (minimax regret guarantees) and paradigm shift from descriptive to decision-centric memory provide high potential for broad scientific impact. Paper 1, while innovative in its use of the L3 rule for PPI prediction, addresses a narrower domain within computational biology and bioinformatics.

vs. From Single-Step Edit Response to Multi-Step Molecular Optimization

gpt-5.25/12/2026

Paper 2 is likely higher impact: it introduces a principled, general rate-distortion formulation of agent memory centered on decision quality, yielding theoretical objects (forgetting boundary, memory–distortion frontier) and an online algorithm (DeMem) with near-minimax regret guarantees plus empirical validation on agent benchmarks. This combination of novelty, rigor, and broad applicability spans RL, LLM agents, information theory, and systems, and is timely given rapidly growing interest in long-horizon agents under tight context/memory limits. Paper 1 is strong but more domain-specific to molecular optimization workflows.

vs. MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

gemini-3.15/12/2026

Paper 1 introduces a rigorous theoretical framework (rate-distortion) to agent memory with mathematical guarantees, offering broad, fundamental impact on reinforcement learning and LLM agent design. In contrast, Paper 2 is heavily tied to a specific engineering protocol (MCP) and focuses on integrating existing technologies, making its theoretical novelty and long-term scientific impact lower.

vs. NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

gemini-3.15/12/2026

Paper 1 introduces a fundamental theoretical framework for agent memory using rate-distortion theory, supported by provable regret guarantees. By shifting the paradigm of memory from descriptive to decision-centric, it addresses a core bottleneck in AI agent design with broad, domain-agnostic applicability. Paper 2 presents an impressive applied system for automated research, but lacks the methodological rigor and foundational algorithmic novelty of Paper 1, making Paper 1's long-term scientific impact likely higher across the broader ML and RL communities.