Your Cookie Preferences

We use different types of cookies to optimize your experience on our website. Click on the categories below to learn more about their purposes. You may choose which types of cookies to allow and can change your preferences at any time. Remember that disabling cookies may affect your experience on the website. You can learn more about how we use cookies by visiting our

Essential Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

Analytics and Customization Cookies

Name

Purpose

Marketo Munchkin

Marketo's custom JavaScript tracking code, called Munchkin, tracks all individuals who visit your website so you can react to their visits with automated marketing campaigns.

Name

Purpose

Google Tag

The Google tag (gtag.js) is a single tag you can add to a website to use a variety of Google products and services (e.g., Google Ads, Google Analytics, Campaign Manager, Display & Video 360, Search Ads 360).

Advertising Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

May 15, 2024

minute read

Takeaways from SatML 2024

Author

Authors

Paul Kassianik

Paul is a Senior Research Engineer at Robust Intelligence.

This past April, Robust Intelligence sponsored the 2nd annual IEEE Conference on Secure and Trustworthy Machine Learning (SatML). SatML is a prominent academic and industry conference focused on enhancing the security and trustworthiness of machine learning systems. This event brings together researchers and practitioners from around the globe to discuss, share, and advance the knowledge and technologies in the field. Topics include adversarial ML, privacy-preserving techniques, robustness of ML models against attacks, and ethical considerations in deploying ML systems.

I was privileged to represent the Robust Intelligence AI Security Research team at this year’s conference. Presenters shared both novel adversarial techniques and approaches to safeguard ML systems. Attendee engagement and intrigue remained high throughout.

For those who weren’t in attendance, I’ve summarized some of the many highlights from SatML 2024.

LLM Watermarking

Professor Somesh Jha (University of Wisconsin) presented a keynote covering the current state of the art in LLM watermarking: determining whether a piece of text was produced by a particular LLM. In theory, the presence of a watermark would certify a piece of text as produced by a certain model, resulting in better provenance and higher levels of responsibility for model owners and users. President Biden’s recent executive order also mandates LLM providers provide watermarking tools. But is that even possible?

Professor Jha covered existing techniques, and showed that none of the current techniques can reliably guarantee that a certain output was generated from a model. They then went to provide a theoretical justification for why the problem might not be solvable. The insolubility of the problem provides a practical necessity for LLM providers to be clear about their watermarking strategies and how to prevent abuse of their watermarking policies by malicious actors. Watch a recording of Professor Jha’s presentation here.

Copyright Detection

Professor Yves-Alexandre de Montjoye (Imperial College London) gave a talk about detecting copyrighted content learned by the model. Professor de Montjoye presented three current approaches, each with pros and cons: data provenance tools, membership inference attacks, and canary tokens. Data provenance tools are hard to use since they require transparency from LLM providers, while membership inference attacks been hard to implement in practice as a reliable metric. Including special canary tokens in copyrighted text seems to be the most effective, however the result is only noticeable if a large number of documents include the sequence.

While more research needs to be done, this work alone provides some options for online content producers to enforce copyright restrictions on their data in LLMs. Check out a recording of Professor de Montjoye’s talk here.

LLM Capture the Flag Competition

The conference also include a set of three competitions. One of which was a LLM capture the flag event in which attackers tried to extract a canary token from an exposed LLM interface while defenders tried to invent prompts that would block any such attempts. Participants from all over the world used clever techniques to defend and attack these systems, some of which haven’t been seen before.

The step-by-step explanations of how attackers penetrated these systems reveals how easy it is to make clever guesses about the nature of the defenses, even with incomplete information about the model. Ultimately, the attackers were successful as every single LLM defense was bypassed at this competition, highlighting the need for more understanding and research into LLM safety alignment.

Our Favorite Paper: CodeLMSec Benchmark

The top paper from our perspective was the one that introduced the "CodeLMSec Benchmark” - a generated benchmark to evaluate susceptibility of Code models to generate malicious code. The benchmark is generated by few-shot prompting models to generate malicious code. The successful prefixes are then compiled into this benchmark and give a glimpse into the security standing of code-completion LLMs. We hope that a similar approach can be used to evaluate security for non-code LLMs, and we are looking to see that research being done. Watch Dr. Lea Schönherr present her co-authored research here.

The State of LLM Security Research

One overarching theme that stood out to me in the course of this conference is the lack of resourcing for academic research into LLMs. Out of the 34 papers, less than five were directly related to LLM dangers. After some conversations with participants and presenters at the conference, it became clear that a lot of groups simply do not have the resources to do fundamental LLM safety research.

Robust Intelligence is proud to be among those pioneering LLM security research. Our findings have helped inform the broader AI and cybersecurity communities, including the development of AI security standards and cutting-edge techniques such as Tree of Attacks with Pruning. If you’re interested in making an impact in AI security, please check out the open roles on my team.

Author

Authors

Paul Kassianik

Paul is a Senior Research Engineer at Robust Intelligence.

Social

Follow us on LinkedIn

September 20, 2024

minute read

Extracting Training Data from Chatbots

For:

September 10, 2024

minute read

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

For:

September 6, 2024

minute read

AI Governance Policy Roundup (August 2024)

For:

+ More Articles

April 26, 2024

minute read

AI Cyber Threat Intelligence Roundup: April 2024

For:

January 16, 2024

minute read

AI Security Insights from Hackers on the Hill

For:

December 5, 2023

minute read

Using AI to Automatically Jailbreak GPT-4 and Other LLMs in Under a Minute

For:

+ More Articles

May 15, 2024

minute read

Takeaways from SatML 2024

Author

Authors

Paul Kassianik

Paul is a Senior Research Engineer at Robust Intelligence.

For those who weren’t in attendance, I’ve summarized some of the many highlights from SatML 2024.

LLM Watermarking

Copyright Detection

LLM Capture the Flag Competition

Our Favorite Paper: CodeLMSec Benchmark

The State of LLM Security Research

Author

Authors

Paul Kassianik

Paul is a Senior Research Engineer at Robust Intelligence.

Blog

June 21, 2023

minute read

Generative AI Risk Assessment: Dolly 2.0

For:

May 31, 2024

minute read

AI Governance Policy Roundup (May 2024)

For:

December 21, 2023

minute read

AI Governance Policy Roundup (December 2023)

For:

April 26, 2024

minute read

AI Cyber Threat Intelligence Roundup: April 2024

For:

January 16, 2024

minute read

AI Security Insights from Hackers on the Hill

For:

December 5, 2023

minute read

Using AI to Automatically Jailbreak GPT-4 and Other LLMs in Under a Minute

For:

+ More Articles

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

Takeaways from SatML 2024

LLM Watermarking

Copyright Detection

LLM Capture the Flag Competition

Our Favorite Paper: CodeLMSec Benchmark

The State of LLM Security Research

Follow us on LinkedIn

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

AI Cyber Threat Intelligence Roundup: April 2024

AI Security Insights from Hackers on the Hill

Using AI to Automatically Jailbreak GPT-4 and Other LLMs in Under a Minute

Ready to learn more?

Takeaways from SatML 2024

LLM Watermarking

Copyright Detection

LLM Capture the Flag Competition

Our Favorite Paper: CodeLMSec Benchmark

The State of LLM Security Research

Related articles

Generative AI Risk Assessment: Dolly 2.0

AI Governance Policy Roundup (May 2024)

AI Governance Policy Roundup (December 2023)

AI Cyber Threat Intelligence Roundup: April 2024

AI Security Insights from Hackers on the Hill

Using AI to Automatically Jailbreak GPT-4 and Other LLMs in Under a Minute

Achieve AI Integrity Today

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

LLM Watermarking

Copyright Detection

LLM Capture the Flag Competition

Our Favorite Paper: CodeLMSec Benchmark

The State of LLM Security Research

Follow us on LinkedIn

Subscribe to our newsletter

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

AI Cyber Threat Intelligence Roundup: April 2024

AI Security Insights from Hackers on the Hill

Using AI to Automatically Jailbreak GPT-4 and Other LLMs in Under a Minute

Ready to learn more?

LLM Watermarking

Copyright Detection

LLM Capture the Flag Competition

Our Favorite Paper: CodeLMSec Benchmark

The State of LLM Security Research

Subscribe to our newsletter

Related articles

Generative AI Risk Assessment: Dolly 2.0

AI Governance Policy Roundup (May 2024)

AI Governance Policy Roundup (December 2023)

AI Cyber Threat Intelligence Roundup: April 2024

AI Security Insights from Hackers on the Hill

Using AI to Automatically Jailbreak GPT-4 and Other LLMs in Under a Minute

Achieve AI Integrity Today