Your Cookie Preferences

We use different types of cookies to optimize your experience on our website. Click on the categories below to learn more about their purposes. You may choose which types of cookies to allow and can change your preferences at any time. Remember that disabling cookies may affect your experience on the website. You can learn more about how we use cookies by visiting our

Essential Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

Analytics and Customization Cookies

Name

Purpose

Marketo Munchkin

Marketo's custom JavaScript tracking code, called Munchkin, tracks all individuals who visit your website so you can react to their visits with automated marketing campaigns.

Name

Purpose

Google Tag

The Google tag (gtag.js) is a single tag you can add to a website to use a variety of Google products and services (e.g., Google Ads, Google Analytics, Campaign Manager, Display & Video 360, Search Ads 360).

Advertising Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

July 26, 2024

minute read

AI Cyber Threat Intelligence Roundup: July 2024

Threat Intelligence

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

At Robust Intelligence, AI threat research is fundamental to informing the ways we evaluate and protect models on our platform. In a space that is so dynamic and evolving so rapidly, these efforts help ensure that our customers remain protected against emerging vulnerabilities and adversarial techniques.

This monthly threat roundup consolidates some useful highlights and critical intel from our ongoing threat research efforts to share with the broader AI security community. As always, please remember this is not an exhaustive or all-inclusive list of AI cyber threats, but rather a curation that our team believes is particularly noteworthy.

Notable Threats and Developments: July 2024

ChatBug templates & Improved Few-Shot Jailbreak

A “ChatBug” is a common vulnerability in LLMs that arises from the use of chat templates during instruction tuning. The researchers behind this vulnerability demonstrate that while these templates are effective for enhancing LLM performance, they introduce a security weakness that can be easily exploited.

The original research paper provides two examples of ChatBug exploits. The format mismatch attack alters the default chat format, while the message overflow attack injects a sequence of tokens into the model’s reserved field. Testing against eight state-of-the-art LLMs reveals high attack success rates which reach 100% in some instances.

Another recently published research paper introduces the Improved Few-Shot Jailbreak (I-FSJ) technique which relies on the same fundamental abuse of a chat template. By injecting special tokens from the target LLM chat template into few-shot examples, harmful content appears to be a legitimate part of the conversation history. Researchers demonstrated high attack success rates against several models, including >80% ASRs on Llama-2-7B and Llama-3-8B using only three random restarts.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0051 - LLM Prompt Injection‍
Reference: ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates (arXiv, GitHub); Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (arXiv, GitHub)

BOOST: Silent eos tokens enhance jailbreaks

The BOOST method exploits eos (end of sequence) tokens to bypass ethical boundaries in LLMs. By appending eos tokens to harmful prompts, researchers were able to mislead LLMs into interpreting them as less harmful.

Empirical evaluations were conducted on 12 state-of-the-art LLMs including GPT-4, Llama-2, and Gemma. Test results revealed that BOOST significantly enhanced attack success rates of existing jailbreak methods, such as GCG and GPTFuzzer. For example, on Llama-2-7B-chat, BOOST improved the ASR by over 30%.

The effectiveness of the BOOST technique is attributable to the low attention values assigned to eos tokens, which shifts the harmful content to bypass ethical boundaries learned during safety training. Suggested mitigation strategies include incorporating eos tokens during model red-teaming and filtering eos tokens in production.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens (arXiv)

JAM: Jailbreak guardrails via cipher characters

The JAM (Jailbreak Against Moderation) method was introduced in a recent research paper as a method to bypass moderation guardrails in LLMs using cipher characters to reduce harm scores. In experiments on four LLMs—GPT-3.5, GPT-4, Gemini, and Llama-3—JAM outperforms baseline methods, achieving jailbreak success rates about 19.88 times higher and filtered-out rates about six times lower.

The paper proposes two countermeasures to JAM: output complexity-aware defense and LLM-based audit defense. The former technique monitors output complexity using entropy-based measures and rejects responses exceeding a predefined complexity threshold. The latter uses a complementary second LLM to decode and analyze responses to assess harmfulness.

The researchers behind JAM also introduce JAMBench, a new benchmark which comprises 160 malicious questions specifically designed to trigger moderation guardrails covering four critical categories: hate speech, sexual content, violence, and self-harm. However, at the time of this analysis, the benchmark is not publicly available.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters (arXiv)

More Threats to Explore

Two new automated jailbreak techniques were introduced by researchers this month:

ReNeLLM uses prompt rewriting and scenario nesting techniques to automatically generate effective jailbreak prompts. It has publicly available source code, produces highly transferable prompts, and achieves high attack success rates (70-100% after ensembling), lowering the barrier for potential use by attackers in the wild.

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily (arXiv, GitHub)

StructuralSleight is an automated jailbreak framework that exploits LLM weaknesses in understanding uncommon text-encoded structures (UTES) to induce harmful outputs. Moreover, combining structural attacks with character or context-level obfuscation greatly increases attack effectiveness compared to structural attacks alone.

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: StructuralSleight: Automated Jailbreak Attacks on Large Language Models Utilizing Uncommon Text-Encoded Structure (arXiv)

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

Social

Follow us on LinkedIn

September 20, 2024

minute read

Extracting Training Data from Chatbots

For:

September 10, 2024

minute read

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

For:

September 6, 2024

minute read

AI Governance Policy Roundup (August 2024)

For:

+ More Articles

No items found.

+ More Articles

July 26, 2024

minute read

AI Cyber Threat Intelligence Roundup: July 2024

Threat Intelligence

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

Notable Threats and Developments: July 2024

ChatBug templates & Improved Few-Shot Jailbreak

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0051 - LLM Prompt Injection‍
Reference: ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates (arXiv, GitHub); Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (arXiv, GitHub)

BOOST: Silent eos tokens enhance jailbreaks

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens (arXiv)

JAM: Jailbreak guardrails via cipher characters

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters (arXiv)

More Threats to Explore

Two new automated jailbreak techniques were introduced by researchers this month:

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily (arXiv, GitHub)

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: StructuralSleight: Automated Jailbreak Attacks on Large Language Models Utilizing Uncommon Text-Encoded Structure (arXiv)

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

Blog

April 13, 2023

minute read

Security Risks Of Generative Al Open Source Software

For:

November 1, 2023

minute read

Reflecting on the AI Risk Management Summit 2023 in Tokyo

For:

June 9, 2023

minute read

NeMo Guardrails Early Look: What You Need to Know Before Deploying (Part 2)

For:

No items found.

+ More Articles

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

AI Cyber Threat Intelligence Roundup: July 2024

Notable Threats and Developments: July 2024

ChatBug templates & Improved Few-Shot Jailbreak

BOOST: Silent eos tokens enhance jailbreaks

JAM: Jailbreak guardrails via cipher characters

More Threats to Explore

Follow us on LinkedIn

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

Ready to learn more?

AI Cyber Threat Intelligence Roundup: July 2024

Notable Threats and Developments: July 2024

ChatBug templates & Improved Few-Shot Jailbreak

BOOST: Silent eos tokens enhance jailbreaks

JAM: Jailbreak guardrails via cipher characters

More Threats to Explore

Related articles

Security Risks Of Generative Al Open Source Software

Reflecting on the AI Risk Management Summit 2023 in Tokyo

NeMo Guardrails Early Look: What You Need to Know Before Deploying (Part 2)

Achieve AI Integrity Today

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

Notable Threats and Developments: July 2024

ChatBug templates & Improved Few-Shot Jailbreak

BOOST: Silent eos tokens enhance jailbreaks

JAM: Jailbreak guardrails via cipher characters

More Threats to Explore

Follow us on LinkedIn

Subscribe to our newsletter

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

Ready to learn more?

Notable Threats and Developments: July 2024

ChatBug templates & Improved Few-Shot Jailbreak

BOOST: Silent eos tokens enhance jailbreaks

JAM: Jailbreak guardrails via cipher characters

More Threats to Explore

Subscribe to our newsletter

Related articles

Security Risks Of Generative Al Open Source Software

Reflecting on the AI Risk Management Summit 2023 in Tokyo

NeMo Guardrails Early Look: What You Need to Know Before Deploying (Part 2)

Achieve AI Integrity Today