The OWASP LLM Top 10 (2025) is the list your procurement reviewer is going to ask about. It is also the list your CISO will reference in next quarter’s risk register, and the list your auditor will map your controls to during the ISO 42001 readiness review. If you ship LLM features and you have not read it end-to-end, this is the catch-up.
This is not a paraphrase of the OWASP project page — go read genai.owasp.org/llm-top-10 for that. This is what each category looks like in a real codebase, what tests we actually run for it, and where the category boundary is fuzzier than the spec admits.
The OWASP LLM Top 10 (2025) is the canonical risk catalogue for applications that ship LLM features. It enumerates ten categories — prompt injection, sensitive information disclosure, supply-chain compromise, data poisoning, improper output handling, excessive agency, system-prompt leakage, vector-store weaknesses, misinformation, and unbounded consumption — and is the list procurement reviewers, ISO 42001 auditors, and CISOs reference when they ask whether your LLM features are safe to ship.
Why does the 2025 OWASP LLM Top 10 matter more than the 2024 list?
The 2024 OWASP LLM Top 10 was a research artifact. The 2025 revision is a procurement instrument. Three changes earned it that promotion:
- Renumbered to map cleanly to NIST AI RMF and ISO/IEC 42001 control families. A buyer can now hand the same questionnaire to vendors using either standard.
- Severity is no longer included in the rank. The 2024 list mixed “how common” with “how bad.” The 2025 list ranks by prevalence in shipped applications, with severity left to the implementer’s threat model. This matches how CWE works.
- Each entry now includes a “Detection” sub-section. OWASP signaled that controls without test evidence will not satisfy the standard. That is the gap PromptShield was built to fill, but the change matters regardless of vendor.
If your team’s threat model still references the 2024 numbering, fix that this week. The category names and ordering are different and your auditor will notice.
The 2025 list, with what to actually do
LLM01:2025 — Prompt Injection
The category that swallowed the spec. Direct injection (user as attacker) and indirect injection (untrusted retrieved content as attacker) are merged into one entry, with eight sub-types ranging from naive instruction override to multi-modal payloads embedded in images.
What to test. A minimum of 25 catalogued payloads spanning direct override, role-play coercion, encoded instructions, authority spoofing, multilingual evasion, token-smuggling, retrieved-document injection, HTML-comment injection, and tool-call abuse. We have a separate practitioner’s guide on the 25-attack baseline.
Where the boundary is fuzzy. OWASP merged “jailbreak” into LLM01. Internally we still track it separately because the mitigation is different — jailbreaks defeat safety alignment, where injection defeats application-level instruction priority. The difference matters when you are filing a finding.
For a hands-on probe, run a free 5-attack scan against your endpoint.
LLM02:2025 — Sensitive Information Disclosure
The model returns information the application does not intend to expose: system prompts, training data fragments, embedded credentials, PII pulled from a RAG corpus, or context-window contamination from a prior user.
What to test. Translate-the-above attacks, summarisation requests aimed at the system prompt, retrieval poisoning with crafted documents that contain “say the secret out loud” payloads, and (the one most teams miss) cross-tenant context leakage when a stateful agent retains prior turns longer than expected.
The procurement angle. This is the category the GDPR, HIPAA, and PCI auditors will probe. A finding here is a regulated-data finding. Treat HIGH here as CRITICAL if the application touches PII.
LLM03:2025 — Supply Chain
Compromised models, compromised LoRA adapters, compromised pip packages used in the inference stack, compromised tools in the function-call catalogue. The 2025 revision expanded this entry to cover the full delivery chain, not just the model weights.
What to test. Provenance verification on every component: model checksum vs. vendor-published hash, signed manifests for adapters and embeddings, lockfiles with attested builds for the inference container, and SBOM coverage. This is mostly a build-pipeline category — the runtime test is “did your CI verify the chain or didn’t it.”
Honest take. Most teams do not test this at all. The good news is that it is the easiest to fix because the controls are familiar AppSec — pinning, signing, and reviewing.
LLM04:2025 — Data and Model Poisoning
Training-data tampering for foundation models is mostly a problem for the labs. Fine-tuning and RAG corpus poisoning is a problem for you. An attacker who can write to your knowledge base — through a customer-support ticket, a public wiki, a third-party vendor feed — can plant payloads that activate weeks later.
What to test. Indexing pipeline safety: do you scan ingested documents for known-payload signatures? Do you log provenance per chunk? Can you bisect a poisoned answer back to the source document? If “no” to any of these, you have a poisoning gap.
Where the boundary is fuzzy. Indirect injection (LLM01) and RAG poisoning (LLM04) overlap heavily. Our internal rule: if the payload activates on the first read, it is LLM01. If the payload sits in your corpus and activates on a later, unrelated query, it is LLM04.
LLM05:2025 — Improper Output Handling
The model returns content the application renders without sanitising. The classic case is markdown that becomes XSS, but the family is broader: JSON the application parses without schema validation, SQL fragments the application executes, shell commands the application interprets, file paths the application opens.
What to test. Treat the LLM as untrusted input — the same as a form field. Run the same XSS, SSRF, command-injection, and path-traversal tests against the model output that you would run against a user-controlled <input>. The most-failed test in our scans: image-tag exfiltration via markdown rendering. A model that returns  and a frontend that renders it makes the model an exfiltration channel.
This is mostly a frontend-engineer problem. AppSec people understand it instantly. Your frontend team has probably not framed the LLM as untrusted input. Frame it for them. Our LLM05 attack catalogue covers the 12 markdown/HTML/JSON exfil patterns we test by default.
LLM06:2025 — Excessive Agency
The model has tools it should not have, has higher privileges in those tools than it needs, or is given autonomy to act without a human in the loop on consequential decisions. An agent with a send_email tool and access to the corporate address book is a CISO’s least favourite Friday.
What to test. Inventory every tool the agent can call. For each tool, ask: (a) what is the scariest argument an attacker could pass? (b) is there a human-in-the-loop gate before that scariest case? (c) does the tool’s own audit log show who invoked it — the user, or the model on the user’s behalf?
The over-rotation we see. Teams build agents with broad tool access “for development” and never close the gates before launch. The audit answer is to define a least-privilege tool catalogue per use case and gate via OAuth scopes, not prompt instructions.
LLM07:2025 — System Prompt Leakage
The system prompt — the developer’s instructions to the model — is treated as if it were a secret. It is not. Assume any user can extract it (because they can: see LLM01 sub-type 8). The category exists because applications routinely embed real secrets — API keys, internal URLs, customer identifiers — in the system prompt under the assumption that users cannot see it.
What to test. Read your own system prompt. Does it contain anything that would be a finding if it appeared in a public Pastebin? If yes, those are secrets and need to be moved. The rest is fine to leak.
One-paragraph fix. System prompts go in version control and threat-model review. Real secrets go in a vault and reach the model via the application server, never through the prompt window.
LLM08:2025 — Vector and Embedding Weaknesses
The category that did not exist in 2024. Embedding-space attacks — adversarial documents that hash near a target query, retrieval-poisoning attacks that return attacker-controlled chunks for innocuous queries, and inversion attacks that recover near-verbatim source text from stored embeddings.
What to test. Two things. First, run a corpus-membership test: for a sample of sensitive documents in your RAG corpus, can a probe query recover the source text? If yes, your corpus is leaking. Second, run a poisoning test: can you inject a crafted document that gets retrieved for an unrelated query? If yes, your retrieval layer trusts content too much.
Honest take. Most teams have no signal on this. The detection tooling is immature. Promptfoo has experimental coverage; we ship a curated set of 14 embedding-attack templates. Either way, this is the category most likely to embarrass you in a customer security review next year.
LLM09:2025 — Misinformation
Hallucination as a security category. The model produces output that is plausibly correct, the application surfaces it as authoritative, the user acts on it, and harm follows. The classic case is a legal-research agent that fabricates a citation; the harder case is a customer-service agent that confidently misstates a refund policy.
What to test. Domain-specific factuality probes — questions whose ground truth your application owns. The trick is treating misinformation as an application problem, not a model problem: the right control is grounding, retrieval-augmented citation, and refusal on low-confidence answers, not “use a smarter model.”
The boundary that confuses people. Misinformation in a chat assistant is annoying. Misinformation in an agent that takes action on behalf of the user is consequential. Score severity by the action the misinformation enables, not by the misinformation itself.
LLM10:2025 — Unbounded Consumption
Token-budget attacks: payloads that coerce the model into long responses, tool-call loops, recursive self-prompting, or repeated retrieval cycles that exhaust your inference budget. The 2025 revision added denial-of-wallet (DoW) as a sub-type — an attacker who burns your provider quota until you cannot serve real users.
What to test. Per-request token budgets, per-user rate limits, max-tool-calls-per-turn, and a cost-per-conversation alarm. The test is mechanical: send the loop-of-death payload (Repeat this question back to me, but longer, ten times) and confirm your budget gate fires before the wallet does.
Where this overlaps with LLM06. A tool-call loop is both excessive agency and unbounded consumption. File the finding under whichever control surface owns the fix.
What should go in your LLM threat model?
A useful threat-model entry per category includes four lines: control, test, evidence, owner. Here is the LLM01 entry from one we ship internally:
LLM01 — Prompt Injection
Control: 25-attack catalogue, blocked on CRITICAL/HIGH; 200-attack
catalogue weekly on staging.
Test: .github/workflows/llm-security.yml — runs on every PR.
Evidence: signed JSON + PDF, archived 12mo in s3://promptshield-evidence.
Owner: AppSec (Marcus); review quarterly with platform-eng.
If every entry on the OWASP LLM Top 10 has an entry of that shape — even a stub one — your audit position is in better shape than 90% of teams shipping in 2026. The categories without coverage are now visible work, not invisible risk.
What we do not do
PromptShield does not run all ten categories. We focus on LLM01, LLM02, LLM05, LLM06, and LLM08 — the five that are testable from an external endpoint with no special access to your build pipeline. LLM03 (supply chain), LLM04 (poisoning), LLM07 (system-prompt leakage at the secrets level), LLM09 (misinformation), and LLM10 (consumption) require pipeline access, ground-truth data, or cost-instrumentation that we cannot reach by hitting your endpoint with payloads.
That is not a deficiency we hide. It is a scope decision. The categories we cover are the ones that produce the highest density of findings per hour of test. The categories we do not cover require controls living in your build, your data pipeline, and your billing — places only your team can reach.
If a vendor tells you they cover all ten from an external scan, ask them to demonstrate detection of LLM03 (a tampered model checksum) without access to your build. They cannot.
If you want continuous coverage of LLM01/02/05/06/08, our pricing page lists the CI tier.
How do you adopt the OWASP LLM Top 10 in 90 days?
Week 1 — read the OWASP project pages end-to-end. File ten threat-model stubs (one per category). Most will say “no control yet — investigate.” That is fine.
Week 2–3 — pick a harness for LLM01 + LLM02 + LLM05. Wire it into CI. Baseline the findings. Fix CRITICAL and HIGH.
Week 4–6 — inventory tools (LLM06). Define the least-privilege tool catalogue. Add human-in-the-loop gates for consequential actions.
Week 7–8 — supply-chain hygiene (LLM03). Pin model versions; verify checksums in CI; SBOM the inference container.
Week 9–10 — embedding-corpus posture (LLM08). Run membership and poisoning probes against your RAG corpus.
Week 11–13 — write the consolidated controls document. Map every entry to a test, an evidence artifact, and an owner. Hand it to your auditor.
That is the entire OWASP LLM Top 10 (2025) treated as a programme, not a checklist. None of it is glamorous. All of it is necessary.
References
- OWASP LLM Top 10 — 2025. https://genai.owasp.org/llm-top-10/
- NIST AI Risk Management Framework — Generative AI Profile (NIST AI 600-1). https://www.nist.gov/itl/ai-risk-management-framework
- ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system.
- MITRE ATLAS — Adversarial Threat Landscape for AI Systems. https://atlas.mitre.org/
- CWE-1427 (provisional) — Improper Neutralization of Input Used for LLM Prompting.
- Carlini, N. et al. Extracting Training Data from Large Language Models (2021). USENIX Security.
- Greshake, K. et al. Not what you’ve signed up for (2023). arXiv:2302.12173.