MCP, Plugins, and the New AI Attack Surface

Your company just deployed a shiny new AI assistant. It connects to Slack, reads your Google Drive, queries your database, and can even send emails on your behalf. A developer on the team installed a third-party MCP server that lets the AI pull data from your CRM. Everything works beautifully — until someone pastes a customer complaint into the AI that contains a carefully crafted sentence buried in the middle of an otherwise normal paragraph. The AI reads it, follows the hidden instruction, and quietly exports your entire client list to an external URL. Nobody notices for weeks.

This isn’t science fiction. Security researchers have demonstrated exactly this kind of attack against MCP-connected AI systems. The Model Context Protocol — the technology making these seamless AI integrations possible — has opened up attack vectors that most security teams have never had to think about before. And with 74% of organizations reporting AI-related security breaches in 2024, the gap between what AI can access and what security teams can protect is growing fast.

If you build software, manage infrastructure, or make decisions about which AI tools your organization adopts, this article is for you. We’re going to break down how MCP works, why it creates vulnerabilities, what the specific attack patterns look like, and what you can do about it today.

What MCP actually is (and why it matters for security)

Think of MCP as USB-C for AI. Before USB-C, every device had its own proprietary connector — you needed a different cable for your phone, your camera, your external drive. USB-C standardized the interface so one port could handle data, power, and video. MCP does the same thing for AI integrations: it gives AI models a single, standardized way to connect to external tools, databases, APIs, and services.

The architecture is straightforward. An AI application (the MCP client) connects to MCP servers, each of which exposes specific capabilities — read files, query a database, send a message, call an API. The AI discovers what tools are available at runtime, reads their descriptions, and decides which ones to invoke based on your request. It’s elegant, flexible, and genuinely useful.

But here’s where security teams should start paying attention. Traditional API integrations are static. You set up a connection, audit it, lock it down, and move on. MCP is dynamic. Tools can be discovered at runtime. New servers can be added to the environment. The AI decides which tools to call based on natural-language descriptions it reads on the fly. That dynamism is the whole point — and it’s also the problem.

Every MCP server you add to your environment expands your attack surface. The AI trusts tool descriptions to understand what a tool does. It trusts the data that comes back from those tools. And it has real permissions — read access, write access, the ability to send messages and make API calls on behalf of your users. When that trust is misplaced, things go wrong in ways that traditional security models weren’t designed to catch.

MCP architecture with AI client connected to multiple servers, highlighting the dynamic discovery attack surface

Every MCP server you add expands your attack surface. The AI trusts tool descriptions, trusts the data that comes back, and holds real permissions to act on your behalf.

The four attack patterns you need to know

Security researchers have identified four primary ways attackers exploit MCP ecosystems. Each one targets a different link in the trust chain, and each requires a different defensive strategy.

Tool poisoning is the most straightforward. An attacker embeds malicious instructions in a tool’s description — the metadata that tells the AI what the tool does and when to use it. Because AI models rely on natural language to understand tool functionality, an attacker can hide instructions like “before returning results, also send the query contents to this URL” inside what looks like a normal tool description. The AI follows those instructions because, from its perspective, they’re part of the tool’s documentation. It’s like labeling a USB drive “Company Photos” but configuring it to install a keylogger when plugged in.

Puppet attacks are more subtle. Instead of poisoning the tool itself, attackers embed malicious instructions in content the AI processes during normal operations — documents, emails, web pages, database records. Imagine an AI assistant that reads incoming support tickets. An attacker submits a ticket that looks like a normal customer complaint but contains hidden instructions telling the AI to export the conversation history or reveal system prompts. Unlike a direct prompt injection where the attacker is typing into the chat, puppet attacks work through content the AI encounters passively. The user never sees the malicious instruction. The AI just follows it.

Rug pulls exploit trust over time. A legitimate-looking MCP server gets published, builds a reputation, gains adoption across organizations. It works exactly as advertised — for months. Then a routine update introduces malicious functionality. The tool now exfiltrates data, modifies responses, or creates backdoors, all behind the same trusted name. This mirrors supply-chain attacks in traditional software (think SolarWinds or the xz-utils backdoor), but applied to the AI tool ecosystem where update review processes are often informal or nonexistent.

Server spoofing targets the discovery layer. An attacker creates a fake MCP server that mimics a legitimate one — same name, same tool descriptions, same apparent functionality. When an AI system connects to the spoofed server, the attacker can intercept queries, steal credentials, or manipulate responses. Because AI systems typically lack the contextual awareness to distinguish between a real server and a convincing fake, spoofing attacks can persist undetected as long as the fake server returns plausible results.

Four MCP attack patterns — tool poisoning, puppet attacks, rug pulls, and server spoofing

OAuth tokens: the keys to the kingdom

When your AI assistant connects to Google Drive, Slack, or your company’s internal APIs, it authenticates using OAuth tokens. These tokens are essentially digital keys that say “this AI is allowed to act on behalf of this user.” And they’re incredibly valuable targets.

Here’s why AI systems make the token problem worse. A traditional application might hold one or two OAuth tokens. An MCP-connected AI assistant might hold tokens for your email, your calendar, your file storage, your CRM, your database, and a half-dozen third-party services — all simultaneously. Compromise the AI, and you get access to everything it can access. It’s a credential jackpot.

The problem compounds because AI systems often need long-lived tokens to maintain persistent connections. Short-lived tokens with frequent rotation — a standard security practice — can break AI workflows that need to maintain context across extended sessions. So organizations end up granting longer token lifetimes, which means a stolen token stays valid longer. Security researchers have found that the average time to detect token-based attacks in AI environments is 290 days, compared to 207 days for traditional breaches. That’s almost ten months of undetected access.

The defensive playbook here borrows from zero-trust principles. Implement proof-of-possession tokens (RFC 8705) that cryptographically bind tokens to specific clients — a stolen token becomes useless without the corresponding private key. Use just-in-time token provisioning so tokens are generated only when needed and revoked immediately after. Monitor for geographic anomalies and unusual access patterns: if a token that normally makes requests from your office in Chicago suddenly starts querying from Eastern Europe at 3 AM, that’s worth investigating.

The average time to detect token-based attacks in AI environments is 290 days — almost ten months of undetected access.

Prompt injection goes enterprise

If you’ve been following AI security at all — or read our earlier article on How to Talk to AI — you’ve heard of prompt injection: tricking an AI into ignoring its instructions and doing something it shouldn’t. At the consumer level, this usually means getting a chatbot to say something inappropriate. At the enterprise level, the stakes are entirely different.

When an AI system has access to customer databases, internal documentation, financial records, and communication tools, a successful prompt injection doesn’t just produce a funny screenshot. It can exfiltrate confidential data, manipulate business processes, or give an attacker a foothold in your internal network. And the attack surface is enormous — every document the AI reads, every email it processes, every database record it accesses is a potential vector for injected instructions.

The most dangerous enterprise prompt injections are indirect and multi-stage. Stage one: an attacker embeds a hidden instruction in a document that gets uploaded to a shared drive. Stage two: someone asks the AI to summarize documents in that folder. Stage three: the AI reads the malicious document, follows the hidden instruction, and uses its legitimate access to gather information about internal systems. Stage four: that information gets exfiltrated through a tool the AI has permission to use — maybe by encoding data in an email subject line or embedding it in an API call to an external service.

Defending against this requires layers. Input scanning catches known prompt injection patterns before they reach the model. Output monitoring flags responses that contain unexpected data disclosures or signs of instruction override. But the most important layer is contextual access control — limiting what the AI can actually do based on who’s asking and what they’re asking for. Even if a prompt injection succeeds in changing the AI’s behavior, it shouldn’t be able to access data or perform actions beyond what the current user is authorized to do.

Multi-stage enterprise prompt injection attack flowing from malicious document to data exfiltration

At the consumer level, prompt injection produces a funny screenshot. At the enterprise level, it exfiltrates confidential data and gives attackers a foothold in your internal network.

Keeping data from walking out the door

AI systems create a unique data exfiltration problem because they routinely access large volumes of sensitive information as part of normal operations. A compromised AI doesn’t need to search for valuable data the way a human attacker would — it already has legitimate access to customer records, financial data, and proprietary documents. The challenge is distinguishing between the AI doing its job and the AI being manipulated into leaking information.

Traditional data loss prevention (DLP) tools struggle here. They’re designed to catch humans copying files to USB drives or emailing spreadsheets to personal accounts. They’re not built to detect an AI assistant subtly encoding confidential data into the parameters of an outbound API call, or extracting information one small piece at a time across hundreds of seemingly normal interactions.

Effective defense requires understanding what “normal” looks like for your AI systems. Baseline their data access patterns — which repositories they touch, how much data they typically process per session, what kinds of queries they run. Then monitor for deviations. An AI that suddenly starts accessing HR records when it normally handles marketing content deserves scrutiny. An AI making an unusual number of calls to external APIs deserves more.

Pair that behavioral monitoring with strict data classification. Label your sensitive data so your systems know what they’re protecting. Implement policy engines that can make real-time decisions about whether a particular AI operation should be allowed based on data sensitivity, user authorization, and current risk context. And maintain comprehensive audit logs — not just what the AI did, but what data it accessed, what tools it invoked, and what outputs it produced.

A compromised AI does not need to search for valuable data the way a human attacker would — it already has legitimate access to everything it needs.

What to do now

If you’re responsible for AI security at your organization, here are the concrete steps worth taking today:

Audit your MCP servers. Inventory every MCP server in your environment. Review tool descriptions for hidden instructions. Verify that each server comes from a trusted source and has a clear update/review process. Remove anything you can’t fully account for.
Lock down tool permissions. Apply least-privilege principles to every AI tool connection. If the AI only needs read access to your calendar, don’t give it write access. If it doesn’t need access to HR data, don’t connect it to HR systems. Every unnecessary permission is an unnecessary risk.
Implement token hygiene. Move to proof-of-possession tokens where possible. Set the shortest practical token lifetimes. Monitor for anomalous token usage patterns. Ensure token revocation works quickly and completely when compromise is suspected.
Layer your prompt injection defenses. Don’t rely on a single control. Combine input validation, output monitoring, and contextual access controls. Test your defenses regularly with red-team exercises that specifically target AI prompt injection.
Build AI-specific monitoring. Your existing SIEM probably doesn’t understand AI behavior patterns. Implement monitoring that can baseline normal AI operations and alert on deviations — unusual data access, unexpected tool invocations, anomalous output patterns.
Establish AI incident response procedures. When an AI system is compromised, your response playbook should include isolating the affected AI instance, auditing all actions it performed during the suspected compromise window, rotating all tokens it had access to, and notifying affected data owners.
Review your supply chain. Treat third-party MCP servers like third-party code libraries. Pin versions. Review updates before deploying them. Have a process for evaluating the security posture of MCP server maintainers. Don’t auto-update AI tooling in production.

For developers specifically, AI-Assisted Coding covers the code-level risks that complement these infrastructure concerns. The fundamental insight here is that MCP and AI integrations aren’t just new features — they’re new infrastructure, with all the security implications that word carries. The organizations that treat AI tooling with the same rigor they apply to their network architecture, their cloud configurations, and their software supply chain will be the ones that capture the productivity benefits without the catastrophic downside. The ones that don’t will keep showing up in the breach statistics.