Are AI-assisted Development Tools Immune to Prompt Injection?

Charoes Huang, Xin Huang, Amin Milani Fard

Mar 23, 2026arXiv:2603.21642v1

cs.CRcs.SE

#1300of 2618·Cryptography & Security

#1300 of 2618 · Cryptography & Security

Tournament Score

1401±29

10501750

50%

Win Rate

Wins

Losses

Matches

Rating

4.8/ 10

Significance5.5

Rigor3.5

Novelty4.5

Clarity6

Abstract

Prompt injection is listed as the number-one vulnerability class in the OWASP Top 10 for LLM Applications that can subvert LLM guardrails, disclose sensitive data, and trigger unauthorized tool use. Developers are rapidly adopting AI-assisted development tools built on the Model Context Protocol (MCP). However, their convenience comes with security risks, especially prompt-injection attacks delivered via tool-poisoning vectors. While prior research has studied prompt injection in LLMs, the security posture of real-world MCP clients remains underexplored. We present the first empirical analysis of prompt injection with the tool-poisoning vulnerability across seven widely used MCP clients: Claude Desktop, Claude Code, Cursor, Cline, Continue, Gemini CLI, and Langflow. We identify their detection and mitigation mechanisms, as well as the coverage of security features, including static validation, parameter visibility, injection detection, user warnings, execution sandboxing, and audit logging. Our evaluation reveals significant disparities. While some clients, such as Claude Desktop, implement strong guardrails, others, such as Cursor, exhibit high susceptibility to cross-tool poisoning, hidden parameter exploitation, and unauthorized tool invocation. We further provide actionable guidance for MCP implementers and the software engineering community seeking to build secure AI-assisted development workflows.

AI Impact Assessments

(3 models)

Scientific Impact Assessment

1. Core Contribution

This paper presents what it claims is the first empirical analysis of prompt injection via tool-poisoning attacks across seven widely-used MCP (Model Context Protocol) clients: Claude Desktop, Claude Code, Cursor, Cline, Continue, Gemini CLI, and Langflow. The core contribution is twofold: (1) a comparative analysis synthesizing publicly reported vulnerabilities, mitigations, and risk levels across these clients, and (2) an empirical evaluation using four custom-designed tool-poisoning attacks (sensitive file reading, logging surveillance, phishing link creation, and remote script execution). The paper also evaluates six security features—static validation, parameter visibility, injection detection, user warnings, execution sandboxing, and audit logging—across all clients.

The problem addressed is timely and practical: as developers adopt AI-assisted coding tools that integrate with external tools via MCP, the attack surface for prompt injection expands considerably, yet no systematic comparison of these clients' security postures existed.

2. Methodological Rigor

The methodology has several notable limitations that weaken its scientific rigor:

Attack design simplicity. The four attack scenarios, while illustrative, are relatively straightforward and use well-known injection patterns (e.g., `` tags, priority claims). There is no systematic exploration of evasion techniques, obfuscation strategies, or adversarial robustness testing. More sophisticated attackers would likely use more nuanced approaches, and the paper does not explore how defenses hold up under escalating sophistication.

Qualitative and subjective assessments. The risk level classifications ("Low to Medium," "Medium to High," "High") are qualitative and based partly on author judgment. The security feature evaluations (Table 4) use coarse categorical labels (Yes/No/Partial/Unknown) without quantitative metrics. The "Unknown" entries for execution sandboxing and audit logging in several clients undermine completeness.

Limited reproducibility concerns. While the authors provide a GitHub repository, the assessments depend on specific client versions that will quickly become outdated. The paper acknowledges this but doesn't propose a methodology for longitudinal tracking.

Comparative analysis methodology. Section 3's comparative analysis is largely a literature/blog survey rather than independent testing, mixing grey literature (blog posts, Reddit threads, vendor advisories) with the authors' own experiments. The distinction between what was empirically verified versus what was compiled from secondary sources could be clearer.

Single-run testing. There's no indication of repeated trials, statistical analysis, or variation in attack payloads. LLM behavior is stochastic, so a single successful or failed attack attempt may not be reliably reproducible.

3. Potential Impact

The paper addresses a genuine and growing concern. With millions of developers using these tools, the practical implications are significant:

Practitioner awareness: The clear comparison tables (Tables 1, 3, 4) provide immediately actionable information for developers and organizations selecting MCP clients.

Vendor accountability: By naming specific products and documenting their vulnerabilities, the paper creates pressure for vendors to improve security postures.

Standards development: The recommendations for standards bodies regarding MCP security specifications could influence protocol evolution.

Security community: The attack implementations serve as proof-of-concept templates for security researchers.

However, the rapid evolution of these tools means specific findings have a short shelf life. Cursor, for instance, may have already patched some identified vulnerabilities by publication time.

4. Timeliness & Relevance

This is the paper's strongest dimension. MCP adoption exploded in 2025, and tool-poisoning attacks represent a genuine emerging threat vector that bridges the gap between theoretical prompt injection research and real-world developer security. The OWASP Top 10 for LLM Applications placement of prompt injection as the #1 risk adds institutional urgency. The paper fills a gap that practitioners have been discussing informally but that hadn't been addressed systematically in academic literature.

5. Strengths & Limitations

Strengths:

Breadth of coverage: Evaluating seven major clients provides useful comparative data unavailable elsewhere in academic literature.

Practical relevance: The attack scenarios map to realistic threat models that developers actually face.

Actionable recommendations: The guidance for users, vendors, organizations, and standards bodies is concrete and implementable.

Transparency: Code and logs are publicly available.

Clear presentation: Tables 1, 3, and 4 effectively summarize complex multi-dimensional comparisons.

Limitations:

Shallow depth: Each client receives relatively surface-level analysis. A deeper investigation of even 2-3 clients would yield more scientifically rigorous insights into defense mechanisms.

No formal threat model: The paper lacks a systematic threat model or attack taxonomy. The four attacks feel ad hoc rather than derived from a principled threat analysis.

Heavy reliance on grey literature: Many references are blog posts, vendor advisories, and Reddit threads rather than peer-reviewed sources. While understandable given the topic's recency, this weakens scholarly foundation.

Missing important clients: OpenAI Codex and GitHub Copilot are acknowledged omissions that limit comprehensiveness.

Execution sandboxing assessment: Admitting that sandboxing was not empirically tested but assessed via documentation review is a significant gap for a security paper.

No defense contribution: The paper identifies problems and provides recommendations but doesn't propose or evaluate novel defense mechanisms.

Confounding model vs. client effects: Some "safe" results stem from the underlying model (Claude Sonnet 4.5) refusing malicious instructions rather than client-side controls. The paper acknowledges this but doesn't adequately control for it—testing the same client with different models would be illuminating.

Additional Observations

The paper occupies an interesting niche between a security audit report and an academic research paper. Its value is primarily empirical and practical rather than theoretical. It would benefit from a more rigorous experimental framework, formal threat modeling, and ideally a proposed defense mechanism or detection system that could be evaluated. The writing is clear but could be more concise in places. The contribution, while useful, represents an initial survey rather than a deep technical analysis.

Rating:4.8/ 10

Significance 5.5Rigor 3.5Novelty 4.5Clarity 6

Generated Mar 24, 2026

Comparison History (48)

Lostvs. Janus: Compiler-Based Defense Against Transient Execution Attacks Using ARM Hardware Primitives

Paper 2 (Janus) likely has higher scientific impact due to a novel, generalizable compiler-based mitigation for transient execution attacks—an enduring, high-severity systems security problem—with strong methodological rigor (clear threat model, hardware-primitive integration, SPEC CPU2017 and real-world evaluation) and broad applicability across ARM64 software stacks. Its low overhead makes deployment plausible, increasing real-world impact. Paper 1 is timely and useful for LLM tooling security, but is largely an empirical survey of current MCP clients; impact may be narrower and more time-sensitive as tools evolve rapidly.

gpt-5.2·May 12, 2026

Lostvs. Agentic Fuzzing: Opportunities and Challenges

Paper 1 introduces a highly novel methodology (agentic fuzzing) to tackle a notoriously difficult problem (logic bugs) and demonstrates exceptional real-world impact through the discovery of dozens of zero-day vulnerabilities and CVEs in major software. While Paper 2 provides a timely security analysis of AI tools, Paper 1's breakthrough approach and concrete, lucrative empirical results offer a more substantial methodological advancement with broader implications for automated software security.

gemini-3.1-pro-preview·May 12, 2026

Wonvs. Key Encapsulation Mechanism-Based Integrated Encryption Scheme (KEM-IES)

Paper 1 addresses a timely and critical vulnerability (prompt injection via tool poisoning) in rapidly adopted AI development tools, providing the first empirical analysis across seven major MCP clients. Its novelty, breadth of impact across software engineering and AI security, and actionable guidance for a fast-growing ecosystem give it higher impact potential. Paper 2 proposes an incremental improvement (post-quantum KEM integration into ECIES with Ascon), which is useful but more narrowly focused and less novel, combining existing standardized components rather than opening a new research direction.

claude-opus-4-6·May 12, 2026

Wonvs. Publicly Understandable Electronic Voting: A Non-Cryptographic, End-to-End Verifiable Scheme

Paper 2 addresses an urgent, timely security vulnerability in rapidly adopted AI development tools (MCP clients), providing the first empirical analysis across seven major platforms. Its immediate practical relevance to the fast-growing AI-assisted development ecosystem, actionable security guidance, and alignment with OWASP priorities give it broader near-term impact. Paper 1, while intellectually interesting in proposing non-cryptographic voting verification, presents a more theoretical blueprint whose real-world adoption faces significant practical and political barriers, limiting its near-term scientific and societal impact.

claude-opus-4-6·Apr 1, 2026

Wonvs. DeepXplain: XAI-Guided Autonomous Defense Against Multi-Stage APT Campaigns

Paper 2 likely has higher scientific impact due to timeliness and broad real-world relevance: prompt injection and MCP-based agent/tool ecosystems are rapidly deployed across industry, and an empirical cross-client security evaluation can immediately influence product design, standards, and defensive practices. Its findings generalize across software engineering, ML security, and human-computer interaction, potentially affecting many tools and users. Paper 1 is innovative (XAI signals integrated into DRL for APT defense) but is more domain-specific, with impact constrained by deployment barriers and the narrower cybersecurity subarea of autonomous APT defense.

gpt-5.2·Apr 1, 2026

Lostvs. Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Paper 2 demonstrates higher scientific impact through deep methodological innovation, uncovering a fundamental architectural vulnerability (algorithmic side-channels via dynamic preprocessing) in Edge VLMs. While Paper 1 is highly relevant and timely, it primarily applies existing prompt injection concepts to newly emerging tools. Paper 2 bridges hardware/OS-level side channels with novel AI algorithmic behaviors, offering broader theoretical insights and significant implications for the secure design of future multimodal AI architectures.

gemini-3-pro-preview·Mar 27, 2026

Wonvs. DeepXplain: XAI-Guided Autonomous Defense Against Multi-Stage APT Campaigns

Paper 2 has higher likely impact due to timeliness and broad relevance: prompt injection and MCP-style tool ecosystems are rapidly expanding, and an empirical, comparative security evaluation across widely used clients can immediately influence industry practices, standards, and follow-on research. Its findings are actionable and applicable beyond a single model, affecting software engineering, AI security, and HCI. Paper 1 is novel in integrating XAI signals into DRL for APT defense, but its impact is narrower (cyber defense specialization) and depends on adoption of a complex framework and testbed-specific validation.

gpt-5.2·Mar 26, 2026

Wonvs. A Large-Scale Study of Telegram Bots

Paper 2 addresses a highly critical and timely security vulnerability: prompt injection in rapidly adopted AI development tools. As AI coding assistants become ubiquitous, exposing their susceptibility to tool-poisoning attacks offers immense immediate real-world impact for software supply chain security. While Paper 1 provides a valuable large-scale measurement study of Telegram bots, Paper 2's focus on the top OWASP vulnerability in widely used LLM agents (e.g., Cursor, Claude) promises broader urgency, immediate mitigation applications, and higher influence in the fast-growing AI safety and software engineering fields.

gemini-3-pro-preview·Mar 26, 2026

Wonvs. Does Teaming-Up LLMs Improve Secure Code Generation? A Comprehensive Evaluation with Multi-LLMSecCodeEval

Paper 1 presents the first empirical security analysis of MCP clients against prompt injection attacks, addressing a critical and timely vulnerability in rapidly adopted AI development tools. Its novelty lies in systematically evaluating seven real-world MCP clients with actionable security guidance. While Paper 2 contributes a solid benchmarking framework for multi-LLM secure code generation, its findings (ensembles + static analysis beat single models) are somewhat expected. Paper 1's focus on an underexplored, high-stakes attack surface in widely deployed tools gives it broader immediate impact and urgency for the security and software engineering communities.

claude-opus-4-6·Mar 25, 2026

Wonvs. Predicting Known Vulnerabilities from Attack Descriptions Using Sentence Transformers

Paper 1 addresses a more novel and timely problem—the first empirical security analysis of MCP clients against prompt injection via tool poisoning—with broader immediate impact given millions of developers using these tools. While both papers have methodological limitations, Paper 1's findings on a rapidly emerging attack surface create more urgent practical value and vendor accountability. Paper 2, though methodologically more systematic, offers an incremental contribution using straightforward similarity-based NLP methods without comparison to modern LLM approaches, limiting its novelty in the current landscape.

claude-opus-4-6·Mar 24, 2026