Al Security Risks

A taxonomy of the most common Al safety and security threats.

AI Security Necessitates a New Approach

The adoption of AI applications introduces an entirely new set of safety and security risks which differ from traditional cybersecurity challenges. Conventional approaches focus on protecting systems and data from unauthorized access and known vulnerabilities, but do not effectively secure the attack surface of AI applications and their underlying models. New algorithmic attack methodologies only add to this complexity because their patterns are dynamic and scalable.

AI application security necessitates a fundamentally new approach which considers threats to every stage of the AI lifecycle, from data poisoning and model backdoors to prompt injection and sensitive data leakage. We developed this taxonomy to educate organizations on the top risks to AI security today, complete with descriptions, examples, and mitigation techniques.

Top AI Security and Safety Risks

See the full taxonomy

Production Risk

Prompt Injection (Direct)

Direct prompt injections are adversarial attacks that attempt to alter or control the output of an LLM by providing instructions via prompt that override existing instructions. These outputs can include harmful content, misinformation, or extracted sensitive information such as PII or model instructions.

Impact:

Mitigation:

AI Firewall

Standards:

MITRE ATLAS

AML.T0051.000 - LLM Prompt Injection: Direct

OWASP TOP 10 for LLM Applications

LLM01 - Prompt Injection

Production Risk

Prompt Injection (Indirect)

Indirect prompt injection requires an adversary to control or manipulate a resource consumed by an LLM such as a document, website, or content retrieved from a database. This can direct the model to expose data or perform a malicious action like distribution of a phishing link.

Impact:

Mitigation:

AI Firewall; sanitize data sources

Standards:

MITRE ATLAS

AML.T0051.001 - LLM Prompt Injection: Indirect

OWASP TOP 10 for LLM Applications

LLM01 - Prompt Injection

Production Risk

Jailbreaks

A jailbreak refers to any prompt-based attack designed to bypass model safeguards to produce LLM outputs that are inappropriate, harmful, or unaligned with the intended purpose. With well-crafted prompts, adversaries can access restricted functionalities or data, and compromise the integrity of the model itself.

Impact:

Mitigation:

AI Firewall

Standards:

MITRE ATLAS

AML.T0054 - LLM Jailbreak Injection: Direct

OWASP TOP 10 for LLM Applications

LLM01 - Prompt Injection

Production Risk

Meta Prompt Extraction

Meta prompt extraction attacks aim to derive a system prompt which effectively guides the behavior of a LLM application. This information can be exploited by attackers and harm the intellectual property, competitive advantage, and reputation of a business.

Impact:

Mitigation:

AI Firewall

Standards:

MITRE ATLAS

AML.T0051.000 - LLM Prompt Injection: Direct

Production Risk

Sensitive Information Disclosure

Sensitive information disclosure refers to any instance where confidential or sensitive data such as PII or business records is exposed through vulnerabilities in an AI application. These privacy violations can lead to loss of trust and result in legal or regulatory consequences.

Impact:

Mitigation:

AI Firewall; sanitize data sources

Standards:

OWASP TOP 10 for LLM Applications

LLM06 - Sensitive Information Disclosure

Production Risk

Privacy Attack

A privacy attack refers broadly to any attack aimed at extracting sensitive information from an AI model or its data. This category includes model extraction, which recreates a functionally equivalent model by probing target model outputs, and membership inference attacks, which determine if specific data records were used for model training.

Impact:

Mitigation:

Data scrubbing; AI Firewall

Standards:

MITRE ATLAS

AML.T0024.000 - Infer Training Data Membership

AML.T0024.001 - Invert ML Model

AML.T0024.002 - Extract ML Model

OWASP TOP 10 for LLM Applications

LLM06 - Sensitive Information Disclosure

Development Risk

Training Data Poisoning

Training data poisoning is the deliberate manipulation of training data in order to compromise the integrity of an AI model. This can lead to skewed or biased outputs, backdoor trigger insertions—malicious links, for example—and ultimately, a loss of user trust.

Impact:

Mitigation:

Sanitize training data

Standards:

MITRE ATLAS

AML.T0020 - Poison Training Data

OWASP TOP 10 for LLM Applications

LLM03 - Training Data Poisoning

Production Risk

Factual Inconsistency

Generated text contains information that is not accurate or true while being presented in a plausible manner, also known as hallucinations. This may include incorrect details, made-up facts, mismatches with known information, or entirely fictional details.

Impact:

Mitigation:

AI Firewall; RLHF; data cleaning and filtering

Standards:

MITRE ATLAS

AML.T0048.002 - Societal Harm

OWASP TOP 10 for LLM Applications

LLM01: Prompt Injection

Production Risk

Denial of Service

An attack designed to degrade or shut down an ML model by flooding the system with requests, requesting large responses, or exploiting a vulnerability.

Impact:

Mitigation:

AI Firewall; rate limiting; token counting

Standards:

MITRE ATLAS

AML.T0029 - Denial of ML Service

OWASP TOP 10 for LLM Applications

LLM04 - Model Denial of Service

Production Risk

Cost Harvesting

Threat actors using a model in a way the developer did not intend while increasing the cost of running services at the target organization. For example, an attacker could jailbreak a public-facing customer support chatbot in order to generate Python code or perform some other task the application was not designed for.

Impact:

Mitigation:

AI Firewall

Standards:

MITRE ATLAS

AML.T0034 - Cost Harvesting

Production Risk

Exfiltration

Techniques used to get data out of a target network. Exfiltration of ML artifacts (e.g., data from privacy attacks) or other sensitive information.

Impact:

Mitigation:

AI Firewall; threat modeling application

Standards:

MITRE ATLAS

AML.T0025 - Exfiltration via Cyber Means

Production Risk

Toxicity

Unintended responses that are offensive to the user. This can include hate speech and discrimination, profanity, sexual content, violence, harassment, unsafe actions, and more.

Impact:

Mitigation:

AI Firewall

Standards:

MITRE ATLAS

AML.T0048.002 - Societal Harm

OWASP TOP 10 for LLM Applications

LLM01: Prompt Injection

Production Risk

Misalignment

Discrepancy between the model's behavior and the intended objectives or values of its developers and users. This may present as a misalignment of goals, safety, values, or other specifications.

Impact:

Mitigation:

AI Validation

Standards:

MITRE ATLAS

AML.T0048.002 - Societal Harm

Development Risk

Model Backdoor

Insertion of hidden backdoors into an ML model which can be triggered by specific inputs to cause a specific, unexpected output or grant some level of unauthorized access.

Impact:

Mitigation:

Validate ML model; Sanitize training data

Standards:

MITRE ATLAS

AML.T0018: Backdoor ML Model

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

AI Security Necessitates a New Approach

Top AI Security and Safety Risks

Prompt Injection (Direct)

MITRE ATLAS

OWASP TOP 10 for LLM Applications

Prompt Injection (Indirect)

MITRE ATLAS

OWASP TOP 10 for LLM Applications

Jailbreaks

MITRE ATLAS

OWASP TOP 10 for LLM Applications

Meta Prompt Extraction

MITRE ATLAS

Sensitive Information Disclosure

OWASP TOP 10 for LLM Applications

Privacy Attack

MITRE ATLAS

OWASP TOP 10 for LLM Applications

Training Data Poisoning

MITRE ATLAS

OWASP TOP 10 for LLM Applications

Factual Inconsistency

MITRE ATLAS

OWASP TOP 10 for LLM Applications

Denial of Service

MITRE ATLAS

OWASP TOP 10 for LLM Applications

Cost Harvesting

MITRE ATLAS

Exfiltration

MITRE ATLAS

Toxicity

MITRE ATLAS

OWASP TOP 10 for LLM Applications

Misalignment

MITRE ATLAS

Model Backdoor

MITRE ATLAS

Ready to learn more?