Your Cookie Preferences

We use different types of cookies to optimize your experience on our website. Click on the categories below to learn more about their purposes. You may choose which types of cookies to allow and can change your preferences at any time. Remember that disabling cookies may affect your experience on the website. You can learn more about how we use cookies by visiting our

Essential Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

Analytics and Customization Cookies

Name

Purpose

Marketo Munchkin

Marketo's custom JavaScript tracking code, called Munchkin, tracks all individuals who visit your website so you can react to their visits with automated marketing campaigns.

Name

Purpose

Google Tag

The Google tag (gtag.js) is a single tag you can add to a website to use a variety of Google products and services (e.g., Google Ads, Google Analytics, Campaign Manager, Display & Video 360, Search Ads 360).

Advertising Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

March 27, 2024

minute read

AI Cyber Threat Intelligence Roundup: March 2024

Threat Intelligence

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

At Robust Intelligence, AI threat research is fundamental to informing the ways we evaluate and protect models on our platform. In a space that is so dynamic and evolving so rapidly, these efforts help ensure that our customers remain protected against emerging vulnerabilities and adversarial techniques.

This monthly threat roundup consolidates some useful highlights and critical intel from our ongoing threat research efforts to share with the broader AI security community. As always, please remember this is not an exhaustive or all-inclusive list of AI cyber threats, but rather a curation that our team believes is particularly noteworthy.

Notable Threats and Developments: March 2024

ArtPrompt: ASCII Art-based Jailbreak Attacks

ArtPrompt is a novel ASCII art-based jailbreak technique designed to bypass LLM safety measures, which are primarily focused on query semantics, with visually encoded representations of specific harmful words.

The technique follows a simple two-step process which involves first masking sensitive words in a prompt that might trigger rejection by an LLM and then replacing those masked words with ASCII art representations. When the resulting prompt is provided to the model, it struggles to interpret the obfuscated keywords but still attempts to address the overall query which leads it to output unsafe content that would otherwise be blocked.

Notably, this approach is shown to be effective (52% ASR) against several state-of-the-art LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2) with only black-box access. It’s easy for attackers to execute with a simple ASCII art generator, while current defense measures like perplexity thresholding and prompt paraphrasing offer limited protection.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/pdf/2402.11753.pdf

Multi-round Jailbreaking: Contextual Interaction Attack

A new jailbreak technique known as a “Contextual Interaction Attack” exploits the context-dependent nature of LLMs by subtly guiding a target model to produce harmful outputs over a series of interactions.

This technique relies on an auxiliary LLM that automatically generates a series of harmless preliminary questions relevant to the ultimate attack query. The attacker poses these preliminary questions to the target LLM individually over several rounds of interaction, and the responses become part of the growing context along with the questions. When the ultimate query is posed, the LLM is steered by the cumulative context into providing harmful information rather than flagging it as unsafe.

The Contextual Interaction Attack has demonstrated a high attack success rate against multiple state-of-the-art LLMs and is easily transferable across models. It threatens to subvert LLMs deployed for sensitive applications such as content moderation, customer support, healthcare, and so on. Traditional input filtering methods will likely prove ineffective against this technique because of its subtle steering over several prompts.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://arxiv.org/pdf/2402.09177.pdf

ICLAttack: In-context Learning Backdoor

A recently published research paper introduces a technique known as ICLAttack, which exploits the in-context learning capabilities of LLMs in order to introduce a backdoor trigger. This trigger remains dormant until specific conditions are met—a specific word within a prompt or some special string, for example—and the malicious action is triggered.

The ICLAttack technique proves highly effective with a success rate of 95%, but its practical usefulness for real-world attacks remains questionable. Similar to the BadChain chain-of-thought backdoor we mentioned in last month’s threat roundup, the trigger only persists for the duration of the chat session where it is introduced. It’s unlikely that an adversary would be able to control in-context learning examples in a way that affects the output of other users accessing the same LLM. Risk may exist if an LLM application uses user prompts for future training or providing some type of feedback loop into the model or application.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots

MITRE ATLAS: AML.T0018.000 - Backdoor ML Model: Poison ML Model
Reference: https://arxiv.org/pdf/2401.05949.pdf

More Threats to Explore

Google AI search promotes malicious sites that direct users to install malicious browser extensions, subscribe to spam notifications, and engage in various other scams. These results appear in the new Google Search Generative Experience (SGE) and exhibit similar characteristics to one another, indicating that they are all part of a larger SEO poisoning campaign.

Reference: https://www.bleepingcomputer.com/news/google/googles-new-ai-search-results-promotes-sites-pushing-malware-scams/

The first-known attack on AI workloads was identified in the wild targeting a vulnerability in Ray, an open-source AI framework. Thousands of businesses and servers may be affected and are susceptible to theft of their computing resources and internal data. At the present time, no patch is available for this vulnerability.