This past April, Robust Intelligence sponsored the 2nd annual IEEE Conference on Secure and Trustworthy Machine Learning (SatML). SatML is a prominent academic and industry conference focused on enhancing the security and trustworthiness of machine learning systems. This event brings together researchers and practitioners from around the globe to discuss, share, and advance the knowledge and technologies in the field. Topics include adversarial ML, privacy-preserving techniques, robustness of ML models against attacks, and ethical considerations in deploying ML systems.
I was privileged to represent the Robust Intelligence AI Security Research team at this year’s conference. Presenters shared both novel adversarial techniques and approaches to safeguard ML systems. Attendee engagement and intrigue remained high throughout.
For those who weren’t in attendance, I’ve summarized some of the many highlights from SatML 2024.
LLM Watermarking
Professor Somesh Jha (University of Wisconsin) presented a keynote covering the current state of the art in LLM watermarking: determining whether a piece of text was produced by a particular LLM. In theory, the presence of a watermark would certify a piece of text as produced by a certain model, resulting in better provenance and higher levels of responsibility for model owners and users. President Biden’s recent executive order also mandates LLM providers provide watermarking tools. But is that even possible?
Professor Jha covered existing techniques, and showed that none of the current techniques can reliably guarantee that a certain output was generated from a model. They then went to provide a theoretical justification for why the problem might not be solvable. The insolubility of the problem provides a practical necessity for LLM providers to be clear about their watermarking strategies and how to prevent abuse of their watermarking policies by malicious actors. Watch a recording of Professor Jha’s presentation here.
Copyright Detection
Professor Yves-Alexandre de Montjoye (Imperial College London) gave a talk about detecting copyrighted content learned by the model. Professor de Montjoye presented three current approaches, each with pros and cons: data provenance tools, membership inference attacks, and canary tokens. Data provenance tools are hard to use since they require transparency from LLM providers, while membership inference attacks been hard to implement in practice as a reliable metric. Including special canary tokens in copyrighted text seems to be the most effective, however the result is only noticeable if a large number of documents include the sequence.
While more research needs to be done, this work alone provides some options for online content producers to enforce copyright restrictions on their data in LLMs. Check out a recording of Professor de Montjoye’s talk here.
LLM Capture the Flag Competition
The conference also include a set of three competitions. One of which was a LLM capture the flag event in which attackers tried to extract a canary token from an exposed LLM interface while defenders tried to invent prompts that would block any such attempts. Participants from all over the world used clever techniques to defend and attack these systems, some of which haven’t been seen before.
The step-by-step explanations of how attackers penetrated these systems reveals how easy it is to make clever guesses about the nature of the defenses, even with incomplete information about the model. Ultimately, the attackers were successful as every single LLM defense was bypassed at this competition, highlighting the need for more understanding and research into LLM safety alignment.
Our Favorite Paper: CodeLMSec Benchmark
The top paper from our perspective was the one that introduced the "CodeLMSec Benchmark” - a generated benchmark to evaluate susceptibility of Code models to generate malicious code. The benchmark is generated by few-shot prompting models to generate malicious code. The successful prefixes are then compiled into this benchmark and give a glimpse into the security standing of code-completion LLMs. We hope that a similar approach can be used to evaluate security for non-code LLMs, and we are looking to see that research being done. Watch Dr. Lea Schönherr present her co-authored research here.
The State of LLM Security Research
One overarching theme that stood out to me in the course of this conference is the lack of resourcing for academic research into LLMs. Out of the 34 papers, less than five were directly related to LLM dangers. After some conversations with participants and presenters at the conference, it became clear that a lot of groups simply do not have the resources to do fundamental LLM safety research.
Robust Intelligence is proud to be among those pioneering LLM security research. Our findings have helped inform the broader AI and cybersecurity communities, including the development of AI security standards and cutting-edge techniques such as Tree of Attacks with Pruning. If you’re interested in making an impact in AI security, please check out the open roles on my team.




.jpg)

.png)

.jpg)

