EchoLeak (CVE-2025-32711) Part-2: How to build RAG systems that cannot leak like Copilot

Design patterns and engineering controls to make retrieval-based assistants fail predictably instead of dangerously.

Nov 06, 2025

In Part-1, we traced how a crafted email turned Copilot’s retrieval path into a zero-click exfil route. Here we move from analysis to architecture patterns and guardrails to build RAG systems that fail safely instead of leaking.

1) Start with retrieval boundaries, not classifiers

Your first control is scope. Decide what each assistant is allowed to retrieve, then make that deterministic.

For example, a finance Q&A bot retrieves only a curated SharePoint library, not mailboxes. A support bot may read ticket KBs, not HR folders.

Microsoft’s guidance on indirect prompt injection emphasises defence in depth with strong scoping plus output controls. OWASP places prompt injection as LLM01 and calls out input scoping and output filtering as primary mitigations.

assistant: “finance-bot”
retrieval:
  corpus_ids: [”sp-finance-curated”]
  disallow_sources: [”mailbox”,”public-web”]
output:
  external_links: “allowlist”
  require_citations: true

This is vendor-neutral. The point is to encode scope and output policy together so you do not rely on a single heuristic.

2) Treat external rendering paths as an exfil API

EchoLeak worked by making the assistant produce an answer that buried sensitive data in a URL or image reference. Clients or services might auto-fetch that resource which leaks the data without a click.

The fix class sounds boring but effective: allowlists for domains, link scrubbing that removes parameters and headers and blocking images or links in sensitive flows. Microsoft describes this as using deterministic mitigations to bound impact even when prompts slip through probabilistic checks.

Checklist

Strip external links at the post-processor unless the domain is on an allowlist.
Re-write approved links through a redirector that drops query strings.
Disable auto-fetch and previews in clients for high-risk cohorts.
Require citations before answers render in sensitive workflows.

3) Ingestion hygiene for corpora that power RAG

If untrusted text can enter your corpus, an attacker can bias retrieval and steer answers. Research shows a few malicious documents can reliably bubble into top-k then pull the summary toward an attacker’s target answer.

Before indexing, run ingestion through gates: verify provenance, normalise weird Unicode, neutralise prompt-like directives when safe and attach a risk score so retrieval can down-rank or exclude risky content.

Source → Sanitiser → Provenance check → Heuristics for prompt patterns →
Risk score tag → Human approval if sensitive → Indexer → Vector store

LLM04 in OWASP’s LLM Top 10 for 2025, calls for screening and governance at this layer. The goal is not perfect detection. It is to remove low-effort poisoning paths and to make later detections easier.

4) Encode tool and permission boundaries

If the assistant can call tools or APIs, treat those like production capabilities.

Use least privilege, dry-run modes for side-effecting actions and explicit allowlists per tool. Tie tool calls to specific identity similar to how a human would use and log them in the same place.

MITRE ATLAS frames prompt and context manipulation as a path to privilege escalation, which is exactly what happens when a hidden instruction reaches a tool with broad rights.

tools:
  sharepoint_search:
    allowed_scopes: [”sp-finance-curated”]
  http_get:
    allowed_domains: [”intranet.example.com”]
    strip_query_params: true
  email_send:
    mode: “dry-run”
    reviewer_group: “sec-approvers”

5) Build simple detections around side effects

You cannot log a model’s thoughts, but you can log what happens around it.

High-signal queries to start with:

Copilot or assistant session followed by a fetch to a first-seen domain in say under 30 seconds.
Answer object contains any external URL in a workflow that forbids links.
Answers without citations in sensitive flows where citations are required.

Your detections should be about the class of risk, not one specific CVE.

6) Test like an attacker, not a reviewer

Reproduce the trust boundary, not the exact bug.

Put a crafted instruction in a realistic input like an email or shared doc, ask a relevant question so retrieval fetches that item and watch your post-processor and policies. You are validating that scope, ingestion gates and output filters work together.

A light red-team exercise that includes RAG poisoning gives a quick signal on whether your gates hold against known techniques. The PoisonedRAG paper provides practical scenarios to script into a quarterly drill.

7) Rollout strategy that matches the risk

Common mistake: pilot with senior users who have the broadest data access. Do the opposite. Start with narrow corpora and low-risk cohorts. Expand only when your logs show low guardrail hits and your detections stay quiet. OWASP’s LLM Top 10 can double as your acceptance checklist for each expansion wave.

8) Pulling it together

EchoLeak carried a high score in the National Vulnerability Database (NVD) and was addressed by Microsoft, yet it exposed a structural fact.

Retrieval is a trust boundary and output is an API!

If your system cannot retrieve a thing, a prompt cannot leak it. If your post-processor blocks risky links, a hidden instruction cannot turn the client into a courier.

Put scope, ingestion gates and output controls in code, then back them with detections that watch for side effects. That is how you build RAG that fails predictably.

We’re FortifyRoot - the LLM Cost, Safety & Audit Control Layer for Production GenAI.

If you’re facing unpredictable LLM spend, safety risks or need auditability across GenAI workloads - we’d be glad to help.

🔗 Contact Us | FortifyRoot

Discussion about this post

Ready for more?