DMF: A Deterministic Memory Framework for Conversational AI Agents
Matteo Stabile, Enrico Zimuel
Abstract
Conversational AI agents require memory systems that are both scalable and semantically coherent across long interaction horizons. Existing approaches rely predominantly on large language model (LLM)-based summarisation at write time, which introduces non-determinism, escalating token costs, and opacity in pruning decisions. We present the Deterministic Memory Framework (DMF), a CPU-first approach that replaces generative memory compression with a fully deterministic pipeline grounded in classical NLP analysis, vector geometry, and mathematical scoring. DMF assigns each conversational interaction a Survival Score computed from deterministic content signals, conversational cues, and structured provenance, combined through a logistic projection. An interaction-count decay law, denoted as , governs how relevance evolves as new turns arrive, where is the number of newer interactions rather than wall-clock time, preserving full determinism. We present the mathematical formulation of DMF, its structured recall pipeline, the pruning decision procedure, and the evaluation protocol. Experiments are conducted on a purpose-built benchmark using the LoCoMo and LongMemEval datasets. We compare DMF against Mem0, a popular memory layer for AI agents. DMF achieves comparable accuracy while using zero tokens to prepare the memory context and 5x to 242x fewer tokens over the entire conversation. These results show that it is possible to eliminate LLM calls from the memory-management loop, reducing token costs to nearly zero and enabling deterministic memory systems for conversational AI agents.
AI Impact Assessments
(1 models)Scientific Impact Assessment: DMF — A Deterministic Memory Framework for Conversational AI Agents
1. Core Contribution
DMF proposes replacing LLM-based memory management in conversational AI agents with a fully deterministic, CPU-first pipeline. The central idea is that memory scoring, pruning, archival, and retrieval can be accomplished without any generative model calls by combining classical NLP features (POS-based information density, VADER sentiment, named entity counts), vector similarity, and a mathematically defined survival score with interaction-count-based exponential decay.
The key novelty lies in the *composition* of these ideas: a logistic-projected survival score Ω combining content, operational, and provenance channels; score-dependent inertia modulating decay rates so high-value memories persist longer; interaction-count (rather than wall-clock) decay for full determinism; and a structured multi-channel recall pipeline with answerability-aware reranking. The framework eliminates LLM calls from the memory management loop entirely, achieving zero token cost for memory operations.
2. Methodological Rigor
Strengths in formulation: The mathematical framework is clearly presented. The survival score derivation, decay law, pruning mechanisms, and calibration examples are well-specified and reproducible. The interaction-count decay choice is well-motivated — it ensures the memory state is a pure function of the conversation sequence, eliminating temporal non-determinism.
Weaknesses in evaluation: The experimental evaluation has significant limitations:
3. Potential Impact
Token cost reduction is the most compelling practical contribution. The 5–242× reduction in total token usage is substantial and directly translates to cost savings in production deployments. For organizations running conversational agents at scale, this could represent significant operational savings.
Determinism and auditability address a real pain point in production AI systems. The ability to reproduce memory states exactly from conversation sequences is valuable for debugging, compliance, and testing.
CPU-first deployment lowers the infrastructure barrier, enabling memory management on commodity hardware without GPU requirements or API calls.
However, the practical impact may be limited by several factors: (1) the approach is currently English-only; (2) the reliance on rule-based NLP (spaCy POS tags, VADER sentiment, keyword matching) may not capture nuanced semantic content that LLM-based approaches handle naturally; (3) the large number of tunable hyperparameters (α, β, γ, δ, x₀, λ, η, numerous threshold and bonus values) creates a complex configuration surface that may require domain-specific tuning.
4. Timeliness & Relevance
The paper addresses a genuinely important and timely problem. As LLM-based agents move into production, the cost and opacity of memory management become practical bottlenecks. The token cost explosion in long-horizon conversations is a recognized challenge. The push toward deterministic, auditable AI systems aligns with emerging regulatory requirements.
The framing of "zero LLM tokens for memory management" is compelling as a design philosophy, even if the overall system still requires LLM calls for final answer generation. The work contributes to the growing literature on making AI systems more efficient and predictable.
5. Strengths & Limitations
Key Strengths:
Notable Limitations:
6. Additional Observations
The paper reads more as a systems paper with a thorough technical specification than as a research paper with rigorous empirical validation. The mathematical framework, while clearly presented, is largely a composition of well-known techniques (logistic regression, exponential decay, cosine similarity, rule-based NLP). The scientific contribution is in demonstrating that this composition can achieve competitive performance at dramatically lower cost, but this claim needs stronger empirical support.
The future work section is extensive and honest about limitations, which is commendable. The shared-memory and multilingual extensions could significantly broaden applicability.
Generated Jun 3, 2026
Comparison History (20)
Paper 1 addresses a critical bottleneck in the rapidly expanding field of AI agents: the high computational cost and non-determinism of LLM-based memory management. By proposing a highly efficient, CPU-first deterministic framework that drastically reduces token usage (up to 242x), it offers immediate, scalable, and highly impactful real-world applications for conversational AI. While Paper 2 presents an interesting interdisciplinary application of LLMs in epidemiology, Paper 1's methodological innovation and broad implications for AI engineering give it a higher potential for widespread scientific and industrial impact.
Paper 2 is likely to have higher impact: it targets bias mitigation/alignment, a timely and widely relevant problem with broad downstream applications. BiasGRPO offers a clear methodological innovation (group-relative baseline removing critic dependence) that can generalize to other high-variance RLHF settings beyond bias. It also reports improvements over major baselines (DPO, PPO) and releases reusable assets (bias reward model, dataset extension), increasing adoption. Paper 1 is practical and cost-reducing, but its deterministic, classical-NLP memory pipeline may be more incremental and narrower in research reach.
Paper 1 integrates AI, physics-based modeling, and neuroscience to address critical healthcare challenges like Alzheimer's and brain tumors. Its hybrid approach offers profound real-world clinical applications and broad multidisciplinary impact. In contrast, Paper 2 provides significant engineering and efficiency improvements for conversational AI, but its scope is narrower and primarily focused on computational cost reduction rather than transformative real-world health applications.
Paper 1 (DMF) addresses a fundamental infrastructure challenge in conversational AI—memory management—with a novel deterministic, token-free approach that dramatically reduces costs (5x-242x fewer tokens) while maintaining accuracy. This has broad practical impact across all conversational AI systems and introduces a paradigm shift away from LLM-dependent memory. Paper 2 tackles meme understanding with a retrieve-and-reason framework, which is more niche in scope. While solid, its impact is narrower, limited to content moderation and meme analysis. DMF's methodological innovation and cost reduction implications make it more broadly impactful.
Paper 2 (DMF) addresses a broadly relevant problem in conversational AI—memory management for LLM-based agents—with a novel deterministic approach that eliminates LLM calls from the memory loop, achieving dramatic token cost reductions (5x-242x). This has immediate, wide-reaching practical impact given the explosive growth of LLM-based agents. Paper 1 tackles a narrower domain (circular factory reliability for angle grinders) with a competent but incremental combination of existing techniques (CNN-LSTM, FEA, S-N curves). While rigorous, its applicability is limited to specific manufacturing contexts, whereas Paper 2's framework generalizes across all conversational AI applications.
MulFeRL addresses a fundamental limitation in RLVR—sparse, uninformative rewards for failed samples—by introducing verbal feedback in a multi-turn loop with progress credit assignment. This contributes to the rapidly growing and high-impact area of LLM reasoning improvement via RL, with broad applicability across reasoning domains. Paper 2 (DMF) offers a practical engineering contribution for conversational memory management with significant cost savings, but its scope is narrower (memory management optimization) and its novelty is more incremental, replacing LLM summarization with classical NLP techniques. MulFeRL's methodological innovation in combining feedback-guided regeneration with reinforcement learning has broader implications for the field.
Paper 2 likely has higher scientific impact due to greater novelty and breadth: it proposes a generally applicable structured-reasoning and iterative backtracking framework (Thought-ICS) for error localization and self-correction, a central unsolved problem in LLM reliability. The approach is timely, aligns with verification/correction research, and can transfer across tasks (math, logic, tool use) and communities (NLP, alignment, HCI). Paper 1 is practically valuable (deterministic, low-cost memory), but is more engineering-focused and may have narrower conceptual impact beyond agent memory systems.
Paper 1 introduces a novel, domain-agnostic architectural paradigm that directly addresses critical bottlenecks in conversational AI: cost, scalability, and non-determinism in memory management. By eliminating LLM calls for memory processes and achieving up to 242x token reduction, DMF offers massive, immediate real-world utility across the entire AI agent ecosystem. Paper 2 provides a valuable but domain-specific benchmark (finance). While important for evaluation, Paper 1's foundational methodological shift in agent memory architecture presents a significantly broader and more transformative scientific impact.
Paper 1 introduces a comprehensive deterministic memory framework that completely removes LLMs from the memory-management loop, drastically reducing token costs while maintaining accuracy. This offers broader utility, scalability, and cost-efficiency for conversational AI agents compared to Paper 2, which focuses on a specific sub-problem (conflict resolution) using a narrower heuristic approach. Paper 1's methodology represents a more substantial innovation with wider real-world applications.
Paper 1 introduces a novel evaluation dimension ('handoff debt') for coding agents that addresses a real gap between benchmarks and practice. The concept of measuring task resumption costs across agents is original and broadly applicable to the growing field of AI-assisted software engineering. Paper 2, while practically useful in reducing token costs for conversational memory, is more incremental—replacing LLM summarization with classical NLP is a known direction. Paper 1's framing and protocol have greater potential to reshape how the community evaluates and designs coding agents, giving it broader methodological impact.
Paper 2 tackles a critical bottleneck in modern conversational AI: the high token costs and non-determinism of LLM-based memory systems. By proposing a CPU-first, deterministic memory framework that reduces token costs by up to 242x while maintaining comparable accuracy, it offers massive scalability benefits for AI agents across all domains. While Paper 1 presents a solid framework for dynamic profiling and reciprocal matching, Paper 2's fundamental challenge to the prevailing generative memory paradigm has broader applicability, addresses a more urgent cost/efficiency problem in the field, and is likely to see wider adoption in agent architectures.
Paper 1 likely has higher scientific impact due to stronger novelty (agentic, tool-augmented, quantitative TS quality assessment) and broader applicability: time-series quality affects many domains (health, finance, IoT, climate) and can improve downstream modeling/data efficiency. It also contributes a new benchmark (TSQBench) and analyzes core LLM limitations, increasing methodological value and timeliness for LLM evaluation in structured data tasks. Paper 2 is practical and timely for agent systems, but its core ideas rely more on engineering a deterministic pipeline and appears narrower in cross-field scientific reach.
Paper 2 offers a broader and more paradigm-shifting contribution by challenging the prevalent use of LLMs for memory management. Its deterministic, CPU-first approach achieves staggering token reductions (up to 242x) while maintaining accuracy, which has profound implications for the scalability, cost, and interpretability of all conversational AI agents, far exceeding the narrower cross-lingual coding scope of Paper 1.
Paper 2 (DMF) addresses a fundamental and timely problem in conversational AI—memory management for LLM-based agents—with a novel deterministic approach that eliminates LLM calls from the memory loop, achieving 5x-242x token cost reduction. This has broad applicability across all conversational AI systems, offers a paradigm shift from generative to deterministic memory management, and directly addresses scalability/cost concerns critical to the field. Paper 1 applies existing CNN architectures and standard ensemble/augmentation techniques to WiFi-HAR with incremental improvements on a small-scale 3-class problem, representing more applied engineering than fundamental contribution.
Paper 2 addresses a fundamental challenge in LLM agents—autonomous skill acquisition—by automatically inducing reusable reasoning primitives from successful traces. This method not only improves reasoning capabilities but also generalizes across multiple tasks, offering a potentially transformative approach to agent self-improvement. Paper 1 offers a highly practical optimization for agent memory costs and determinism, but Paper 2's contribution to automated reasoning decomposition and dynamic tool creation is likely to have a broader and more profound impact on the development of advanced autonomous agents.
Paper 1 (DMF) addresses a fundamental and widely applicable problem in conversational AI—memory management—with a novel deterministic, cost-efficient approach that eliminates LLM calls from the memory loop. It demonstrates dramatic token cost reductions (5x-242x) while maintaining accuracy, offering immediate practical impact for the rapidly growing AI agent ecosystem. Paper 2 (CASTER/MEDEA) introduces an interesting social resonance evaluation paradigm, but targets a narrower application domain (UGC quality assessment) and relies on more incremental innovations (CoT variants, persona simulation). DMF's contribution is more foundational and broadly applicable across conversational AI systems.
Paper 1 addresses a fundamental, paradigm-level challenge in AI safety and alignment—how superintelligent systems can be designed for cooperation rather than solipsistic optimization. This has broad implications across AI safety, multi-agent systems, institutional design, and policy. Its conceptual framework (self-undermining property, non-solipsistic design) could reshape how the field approaches AI development. Paper 2, while practically useful in reducing token costs for conversational memory, is a more incremental engineering contribution with narrower scope. The timeliness of Paper 1's alignment concerns and its breadth of interdisciplinary impact give it substantially higher potential scientific influence.
Paper 2 presents a novel, deterministic framework for conversational AI memory that eliminates LLM calls during memory management, addressing critical bottlenecks in cost and scalability. Its massive reduction in token usage (up to 242x) with comparable accuracy offers broad, immediate impact across the entire AI agent industry. In contrast, Paper 1 is a systematic review limited to the specific domain of dental healthcare, offering synthesis rather than fundamental methodological innovation.
Paper 2 (PropLLM) has higher estimated impact: it introduces a novel propagation-aware, hop-by-hop causal reconstruction paradigm combining LLM reasoning with verifiable KG evidence and a new attention mechanism (TCPA) encoding causal/topological priors. The work targets an important real-world domain (network fault diagnosis) with demonstrated gains on multiple real datasets, including reduced hallucinations—key for deployability. Its ideas (causal tracing, evidence-grounded LLMs, prior-guided attention) are broadly transferable to other diagnosis/monitoring settings. Paper 1 is practical and cost-saving but more incremental, relying on classical deterministic heuristics.
RoleCDE introduces a novel benchmark addressing a previously underexplored problem—role-alignment value conflicts in role-playing agents—and reveals a significant 'Role Value Decoupling' phenomenon with broad implications for AI safety and alignment research. Its large-scale benchmark (24k instances), systematic evaluation across multiple LLMs, and demonstration that fine-tuning can mitigate the identified issues provide both diagnostic and prescriptive contributions. Paper 2, while practically useful in reducing token costs for memory management, addresses a more incremental engineering optimization with narrower conceptual novelty.