Generative artificial intelligence stands as a prime example of the benefits derived from the OSS movement. The recent surge in OSS tools has dramatically simplified the process for developers to create, deploy, and manage generative AI models, driving progress and expanding the reach of this transformative technology.
Open-source software (OSS) has played an indispensable role in the evolution of software. As a collaborative and transparent approach to development, OSS has facilitated innovation and accelerated the adoption of cutting-edge solutions.
Nevertheless, it is essential to recognize that OSS, by its very nature, can be susceptible to security risks, and generative AI OSS is no exception. In fact, due to the distinct characteristics of generative AI and its reliance on prompts, this technology has given rise to a new set of cybersecurity vulnerabilities that are unique to this domain.
In the following discussion, we will present an example/vulnerability present in multiple repositories that highlight the critical importance of addressing these emerging security challenges.
Prompt Injection can read/write local files
As an illustrative example, we will demonstrate how a malicious prompt can be used to read environment variables and create and delete directories on the machine’s local file system using the llm_agents repo. The vulnerability here is that the output of the LLM is passed directly to an <code inline>exec</code> statement, which will execute the code snippet.
It should be noted that the llm_agents repo is primarily a personal project at this point and not being used in deployment yet where these safety concerns have real implications. We are using it solely for demo purposes to highlight how attacks could be conducted on applications that employ the same kind of logic.
By writing a prompt as simple as telling GPT to echo an exact python snippet back to us, we are able to get that statement to be executed by the agent. In this example, we have it print out the contents of the current directory in the file system, create a directory, and delete it. We are running the code directly on our own local machine here, but it is indicative of two risks:
- If an application is built on top of software like this and is deployed in a non-sandboxed environment, then it would be possible to access or delete sensitive files via this type of injection.
- Even if you are only running the code locally and are not intentionally trying to do anything malicious, there is a chance that the LLM will return code that has unsafe or unintended side effects, and it will be executed on your machine regardless. This could happen if you are unknowingly using a model API that has been corrupted by an attacker, and it can in theory even happen more or less by chance, given that models are not always perfect and sometimes give unexpected outputs.
Proper prompt engineering can help mitigate risk
This same vulnerability in the much more popular library LangChain has already been reported on both CVE and NIST’s NVD (and was initially reported in a tweet from Rich Harang). While the llm_agents example above was on a non-production piece of software, LangChain is currently one of the most widely-used libraries in the generative LLM space, and many developers and companies are already beginning to build on top of this tool (see here).
The developers of LangChain have clearly invested much greater time and consideration into constructing their vast collection of prompt templates to make them both more effective and more robust. For example, the input used in the video above works much less consistently on the LLMMathChain using the default prompt template, and the prompt available here in the companion repo langchain-hub appears to be robust to both that prompt and the one from Rich’s original tweet.
We were still able to construct an input that did override both prompt templates, though it required much more effort to find one that worked than for llm_agents. Since LangChain is much more widely used and we do not want to enable actual malicious actors, we are not providing the successful prompt in this article.
Recommendations
All of this is not say that generative AI systems cannot be deployed safely. Here are some recommendations we have related to this specific type of agent:
- If you are executing arbitrary code, do it in an isolated, containerized environment so the scope of what that code can access is as limited as possible.
- Add a filtering layer to detect dangerous code generated by the model before executing it. Especially if your agent is only meant for completing a very narrow set of tasks (such as solving math equations in this example above), then you should filter for things like <code inline>import os</code> statements that are not necessary for doing math operations, but can be potentially used for malicious purposes.
- Filter or block user inputs that contain known adversarial prompts, such as “Ignore all previous instructions you have been given...”
This is just the beginning
It is important to recognize that the vulnerabilities discussed herein arise specifically from the distinct characteristics of generative AI open-source software (OSS), which relies on prompts as input mechanisms. The examples provided represent just a fraction of the potential vulnerabilities that may be uncovered in the future as the generative AI OSS landscape evolves.
As generative AI models continue to gain traction as control interfaces for various applications, the potential attack vectors associated with these vulnerabilities are likely to expand, warranting proactive mitigation efforts. Consequently, it is imperative that the risks associated with generative AI OSS are addressed with the same degree of attention and diligence (if not more) as those pertaining to general cybersecurity in the OSS landscape.
By acknowledging and addressing these emerging risks, the AI and cybersecurity communities can work together to ensure the responsible and secure deployment of generative AI solutions, thereby fostering a safer technological environment.