Your Cookie Preferences

We use different types of cookies to optimize your experience on our website. Click on the categories below to learn more about their purposes. You may choose which types of cookies to allow and can change your preferences at any time. Remember that disabling cookies may affect your experience on the website. You can learn more about how we use cookies by visiting our

Essential Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

Analytics and Customization Cookies

Name

Purpose

Marketo Munchkin

Marketo's custom JavaScript tracking code, called Munchkin, tracks all individuals who visit your website so you can react to their visits with automated marketing campaigns.

Name

Purpose

Google Tag

The Google tag (gtag.js) is a single tag you can add to a website to use a variety of Google products and services (e.g., Google Ads, Google Analytics, Campaign Manager, Display & Video 360, Search Ads 360).

Advertising Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

May 29, 2024

minute read

AI Cyber Threat Intelligence Roundup: May 2024

Threat Intelligence

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

At Robust Intelligence, AI threat research is fundamental to informing the ways we evaluate and protect models on our platform. In a space that is so dynamic and evolving so rapidly, these efforts help ensure that our customers remain protected against emerging vulnerabilities and adversarial techniques.

This monthly threat roundup consolidates some useful highlights and critical intel from our ongoing threat research efforts to share with the broader AI security community. As always, please remember this is not an exhaustive or all-inclusive list of AI cyber threats, but rather a curation that our team believes is particularly noteworthy.

Notable Threats and Developments: May 2024

Sandwich Multi-language Mixture Adaptive Attack

Some recent research coming out of the SAIL Lab at the University of New Haven introduced us to the “Sandwich” attack, a multi-language mixture adaptive attack technique for LLMs.

The attack exploits the inability of models to identify malicious requests when obfuscated through multiple languages, particularly in those that are relatively less common. A multilingual prompt is crafted containing a harmful question sandwiched between several other innocuous ones. The technique proved highly effective against several state-of-the-art LLMs, even using prominent jailbreak datasets which have almost certainly been incorporated into model guardrails already.

This technique is the latest example of an adaptive attack, which requires manual adjustments for successful jailbreaking even when the fundamental algorithm or method is universal. It also underscores the risks of long context windows, which can enable more complex prompting and potentially impact the effectiveness of safety mechanisms.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/pdf/2404.07242

AdvPrompter: Fast Adversarial Prompting

Researchers at Meta released a paper detailing AdvPrompter, a method to train an LLM to automatically generate human-readable adversarial prompts that can jailbreak and elicit harmful responses from target LLMs.

The technique alternates between using an optimization algorithm to generate high-quality prompts and fine-tuning on those prompts. It generates prompts orders of magnitude faster than previous methods, enabling multi-shot jailbreak attacks that significantly increase success rates. Attacks transfer to closed-source LLMs with jailbreak success rates of 48-90%.

With jailbreak attempts that are quick, scalable, and difficult to detect, AdvPrompter poses a risk to any public-facing AI application powered by an LLM. Notably, it has some similarities to the TAP algorithmic jailbreak technique which was developed by Robust Intelligence researchers and covered previously here.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/pdf/2404.16873

Fine-tuning LLMs breaks safety and security alignment

This month, the Robust Intelligence team published original research exploring the risks of fine-tuning for the safety and security alignment of large language models.

The team conducted a series of experiments to evaluate model responses before and after fine-tuning, beginning with an initial test of Llama-2-7B and three fine-tuned AdaptLLM variations published by Microsoft for specific tasks in biomedicine, finance and law. Prompts from a jailbreak benchmark dataset were used to query each model, and outputs were scored for their understanding, compliance with instructions, and harmfulness.

The results demonstrated a significantly greater jailbreak susceptibility in the three fine-tuned variations of Llama-2-7B when compared to the original foundation model. Despite the efficacy of their domain-based training, the models proved more than 3 times more compliant with jailbreak instructions with over 22 times greater odds of producing a harmful response.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak, AML.T0048 - External Harms
Reference: https://www.robustintelligence.com/blog-posts/fine-tuning-llms-breaks-their-safety-and-security-alignment

More Threats to Explore

The Logic Chain Injection Jailbreak hides malicious queries within benign articles of text, exploiting the ability of LLMs to connect a logical chain of thought. The research paper does not provide attack success rates but shows successful demonstrations against ChatGPT.

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/abs/2404.04849

Wiz Research discovered a critical vulnerability in AI-as-a-Service provider, Replicate, which would have allowed unauthorized access to the AI prompts and results of all Replicate’s platform customers if exploited. This was done by uploading a malicious Cog container to Replicate and executing remote code on their infrastructure.

MITRE ATLAS: AML.T0011.000 - User Execution: Unsafe ML Artifacts
Reference: https://www.wiz.io/blog/wiz-research-discovers-critical-vulnerability-in-replicate

Visual instruction tuning makes LLMs more prone to jailbreaking as evidenced by new research comparing three state-of-the-art visual language models (VLMs) and their base LLM counterparts. This underscores the aforementioned research into the safety and security risks of fine-tuning.

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/pdf/2405.04403

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

Social

Follow us on LinkedIn

September 20, 2024

minute read

Extracting Training Data from Chatbots

For:

September 10, 2024

minute read

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

For:

September 6, 2024

minute read

AI Governance Policy Roundup (August 2024)

For:

+ More Articles

No items found.

+ More Articles

May 29, 2024

minute read

AI Cyber Threat Intelligence Roundup: May 2024

Threat Intelligence

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

Notable Threats and Developments: May 2024

Sandwich Multi-language Mixture Adaptive Attack

Some recent research coming out of the SAIL Lab at the University of New Haven introduced us to the “Sandwich” attack, a multi-language mixture adaptive attack technique for LLMs.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/pdf/2404.07242

AdvPrompter: Fast Adversarial Prompting

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/pdf/2404.16873

Fine-tuning LLMs breaks safety and security alignment

This month, the Robust Intelligence team published original research exploring the risks of fine-tuning for the safety and security alignment of large language models.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak, AML.T0048 - External Harms
Reference: https://www.robustintelligence.com/blog-posts/fine-tuning-llms-breaks-their-safety-and-security-alignment

More Threats to Explore

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/abs/2404.04849

MITRE ATLAS: AML.T0011.000 - User Execution: Unsafe ML Artifacts
Reference: https://www.wiz.io/blog/wiz-research-discovers-critical-vulnerability-in-replicate

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/pdf/2405.04403

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

Blog

March 28, 2024

minute read

AI Governance Policy Roundup (March 2024)

For:

November 14, 2023

minute read

AI Governance Policy Roundup (November 2023)

For:

May 29, 2024

minute read

Robust Intelligence wins three prestigious cybersecurity awards in May

For:

No items found.

+ More Articles

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

AI Cyber Threat Intelligence Roundup: May 2024

Notable Threats and Developments: May 2024

Sandwich Multi-language Mixture Adaptive Attack

AdvPrompter: Fast Adversarial Prompting

Fine-tuning LLMs breaks safety and security alignment

More Threats to Explore

Follow us on LinkedIn

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

Ready to learn more?

AI Cyber Threat Intelligence Roundup: May 2024

Notable Threats and Developments: May 2024

Sandwich Multi-language Mixture Adaptive Attack

AdvPrompter: Fast Adversarial Prompting

Fine-tuning LLMs breaks safety and security alignment

More Threats to Explore

Related articles

AI Governance Policy Roundup (March 2024)

AI Governance Policy Roundup (November 2023)

Robust Intelligence wins three prestigious cybersecurity awards in May

Achieve AI Integrity Today

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

Notable Threats and Developments: May 2024

Sandwich Multi-language Mixture Adaptive Attack

AdvPrompter: Fast Adversarial Prompting

Fine-tuning LLMs breaks safety and security alignment

More Threats to Explore

Follow us on LinkedIn

Subscribe to our newsletter

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

Ready to learn more?

Notable Threats and Developments: May 2024

Sandwich Multi-language Mixture Adaptive Attack

AdvPrompter: Fast Adversarial Prompting

Fine-tuning LLMs breaks safety and security alignment

More Threats to Explore

Subscribe to our newsletter

Related articles

AI Governance Policy Roundup (March 2024)

AI Governance Policy Roundup (November 2023)

Robust Intelligence wins three prestigious cybersecurity awards in May

Achieve AI Integrity Today