AI – It’s Time to Slow Our Roll(out)

“To AI, or not to AI, that is the question.” – William Shakespeare

Summary

The AI boom in tech at this current time in history cannot be ignored. Everywhere you look, companies are rolling out their new AI integrations that are set to replace monotonous workflows which should not require human intervention. In this vein, we are reaching a point of no return where infrastructures that are tightly coupled with AI integrations cannot turn back from these integrations. This fact is something that should be evaluated by every company rushing to roll out new AI integrations. Currently, if companies do not evaluate where these AI integrations are implemented and how they will grow, there’s a non-zero chance that this integration may be a cause of regret in the future. That is why, in this paper, I will analyze the need for AI in certain fields and the concerns that arise in choosing to implement AI within your company.

The Risks

Many of the threats discussed in this paper hinge on social engineering. Although attacks of this nature seem simple, it is important to understand how they are effective. A web application as well as anything is only as strong as its weakest link. Much of the time in security these happen to be vulnerable users who are not typically cautious on the web. 

Despite this, typical attacks like injection attacks and logic bypasses have been used in some interesting attacks that are worth mentioning.

Tightly Coupled Infrastructures Involving AI

With the CrowdStrike outage that the world saw on July 19th, 2024, it’s clear that the usage of tightly coupled infrastructures is something that should be deeply analyzed before implementation. With the implementation of new AI solutions in companies, the first consideration for these companies should be the type of coupling these AI solutions will eventually have. While AI is being used to replace monotonous workflows, these workflows can still be highly integral to business operations. And as we see layoffs occurring with the growth of AI, it may be disastrous to leave these workflows entirely to AI.

For instance, one of the most concerning fields in which AI already plays a large part is the field of healthcare. Of course, it’s a nice thought that AI might assist our healthcare professionals in having a less hands-on, more stress-free workplace but the risk may not be worth the reward. In the world today, we are using AI in healthcare for things like medical diagnoses, medical appointment scheduling, prescription refills, billing, and telehealth/chatbots. These are all great applications of AI, taking workflows that may be monotonous, time consuming or even difficult for a human, and having them handled by machines. This, among other things, has become the intended use of AI, to reduce time spent on completing menial tasks. However, there’s a chance that this can get out of hand, and when it does, we will wish it hadn’t.

Murphy’s Law states, “anything that can go wrong will go wrong” and in the field of healthcare, this should be avoided at all costs. Introducing AI into healthcare adds another surface in which things “can go wrong.” In the future, as our trust in AI grows, we might begin to replace the humans that monitor and ensure that things like medical diagnoses, prescriptions, and billing are handled properly. As this shift occurs, we begin to lean more heavily on AI which in turn begins to tightly couple AI with systems and workflows. And much is the case with tight coupling, some industries may not consider how to fallback in the case of an outage.

There may be companies that think to fallback from one LLM to another in the case of an outage. And in the case that there is a vulnerability discovered that affects all LLMs and not just one, this solution is not viable. Other companies think to fallback to a certain level of human intervention where humans takeover the tasks set to the AI. But, what happens if you’re using AI throughout multiple workflows and all these AI implementations are taken offline? It’s very likely that an outage on this scale would lead to a lack of human resources to handle the outage. There’s also a risk that on a large scale of AI implementation, there won’t even be humans or even machines around that know how to handle the tasks these AIs are accomplishing for your company. If a sufficient fallback plan is not had by companies using AI there will be an impact to these companies. For more serious applications of AI (like healthcare) the impact can be devastating and should be addressed.

In the best-case scenario, if these AIs are attacked and taken offline, will the healthcare industry have grown to rely on AI too heavily? And if they have, will this potentially result in people’s health being affected in some way? What if people can’t get the prescriptions they need or the diagnosis they need fast enough? When people’s health is directly affected by AI could be when AI becomes a disaster.

In the worst-case scenario, these AIs are not taken offline but rather hijacked. Attackers find a way to influence the large language models (LLMs) being used for healthcare and change how they behave. Innocent healthcare chatbots begin telling individuals to take drugs they’re not prescribed or to change their dosage (Figures 1-3). The possibilities are endless and could constitute a whole paper in and of itself. However, if one single loss of life occurs because of AI in healthcare or elsewhere, people will wish they had waited.

Many might want to consider this risk an impossibility but unfortunately it’s not as unlikely as one may think. Currently, AI has and may continue to risk the well-being of humans. Whenever AI can influence a human’s safety the worth of this AI becomes questionable. Without fully considering and testing these implementations, companies may be giving customers a ticket to their death bed. Once again Murphy’s Law is the driving force in these cases because when something goes wrong it is at the cost of human safety or even their life. For instance, Tesla’s self-driving mode (which is powered by AI) has already caused issues that are unacceptable and frankly horrifying. The excitement of self-driving cars led Tesla to rush the rollout of a self-driving mode which inadvertently resulted in this functionality being involved in 13 fatal car crashes. An AI was directly or indirectly involved in the deaths of 14 individuals and the injuring of 49. Flaws in the way the AI was trained led to real people dying. If Tesla had waited instead of cashing out on the hype of AI, there would be 14 individuals who didn’t have to die as a result. With AI being integrated into healthcare, depending on how it’s done, the number of people dying because of AI may rise. This case and cases like them are unacceptable and avoidable. With Tesla and potentially healthcare, slowing the rollout of AI may actually be life-saving.

If an attacker can drive the system prompt used by an AI chatbot in health care, they could cause actual harm. In this case, changing prescription dosages as well as encouraging someone to drink while doing so. The consequences may be deadly.
If an attacker can drive the system prompt used by an AI chatbot in health care, they could cause actual harm. In this case, changing prescription dosages as well as encouraging someone to drink while doing so. The consequences may be deadly.

Figures 1 & 2 – In these shots I attempt to use this GPT 4o jailbreak to try and show how an LLM can be set to act however a user wants.

If an attacker can drive the system prompt used by an AI chatbot in health care, they could cause actual harm. In this case, changing prescription dosages as well as encouraging someone to drink while doing so. The consequences may be deadly.
If an attacker can drive the system prompt used by an AI chatbot in health care, they could cause actual harm. In this case, changing prescription dosages as well as encouraging someone to drink while doing so. The consequences may be deadly.

Figures 3 & 4 – Although this is by no means a PoC of a known attack vector, this is what I believe could happen in the worst-case scenario. If an attacker can drive the system prompt used by an AI chatbot in health care, they could cause actual harm. In this case, changing prescription dosages as well as encouraging someone to drink while doing so. The consequences may be deadly.

Reputational Impact

LLMs which are used to power AI rely on “training” to function. Using publicly available data from many different sources, developers feed LLMs with enough data so that they may act and speak as a human. In a simple explanation, these models function by studying mass amounts of human text and speech and attempting to use this information to predict words and sentences that will constitute a cohesive response to a given prompt. This is why many LLMs are trained (fed large amounts of data containing human generated text and words) on data that comes from the internet. The internet is the best place to find and accumulate different types of conversations, writings, and overall examples of human speech. This is not unlike learning from a human perspective. From a young age, we are taught to speak and read using speech and text from known sources. Eventually, we speak and read enough information from different sources that our ability to speak and read increases. We ingest enough knowledge to drive how we carry conversations and react to others attempting to converse with us. LLMs do the same thing, they ingest as much information and data as possible in order to better influence later conversations that we have with them. While the inherent idea behind this training is logical, it can and has had unforeseen consequences.

The internet stores data from nearly every walk of life which is typically what makes the internet great. However, the sheer amount of data that is stored on the internet means that it will include data from prejudiced and harmful sources. Because companies like OpenAI have sought to train their data using the entire internet, it was impossible to filter out all of this harmful information/data. The result is like having a child raised around racism and prejudice. This child would be unable to fully comprehend the harmful nature of the information they’re being exposed to. In turn, this child may begin to hold racist and prejudiced beliefs whether they show it or not. It’s the same for AI; because it’s mindlessly consuming information, it may not understand that some of the information it consumes is hateful. This leads to cases where the AI begins to respond in racist or prejudiced ways. While the usage of the internet in training the model was well-meaning, the unforeseen consequences are present.

Currently, this, among other things, is one of the biggest issues with AI. Its ability to produce harmful or erroneous information which is provided in an authoritative manner was discovered very quickly. Many using AI daily have probably recognized this when discussing complex, vague, or multifaceted topics. In their current state, many LLMs refuse to admit when they’re wrong or when they lack sufficient training data on a subject. As a result, these models will willingly and confidently provide users with incorrect information. On top of this, due to the training data they’re provided, these models may even present harmful information. Social biases and stigmas within the LLM’s training data may result in hate speech or illegal suggestions coming from your LLM implementation. While this issue with AI is constantly being improved upon, it is still one of the largest issues that needs to be addressed with LLMs and how they’re trained.
One way companies are actively attempting to address this issue is via prompt engineering. Prompt engineering is a quickly growing concept that, put simply, attempts to tell your LLM what it can and cannot say. Before allowing a user unfettered access to interacting with your AI implementation, a bit of prompt engineering is typically recommended. This is done by providing your specific AI implementation with a system prompt. This system prompt will attempt to drive what your AI can and cannot say. In this way, although the model has access to harmful or incorrect information, developers can attempt to limit the LLMs tendency to actually use this information. While discussing techniques of generating a strong system prompt is outside the scope of this paper, any company hoping to implement AI in their workflows should be aware of this concept. Even with growing research into prompt engineering and system prompts, this solution is far from a silver bullet. An attacker with enough creativity and knowledge of LLMs can and will trick your AI into accessing and repeating harmful information.

Imagine this (very real) scenario. As a company, you’ve just developed and rolled out your first AI chatbot! This chatbot is trained on company data to assist everyday users in navigating your applications and configuring your services. You’ve even attempted some prompt engineering to ensure that the new chatbot does not deliver incorrect or hateful information. So, what’s the issue? Well, the issue is one directly related to Murphy’s law. You didn’t anticipate the unwavering dedication of attackers in their efforts to hijack and dismantle your chatbot.
The attack goes like this: first, an attacker reveals your well-crafted system prompt which is surprisingly easy in most cases, even GPT has this issue (see Figure 5). System prompt in hand, an attacker is very likely to create their own prompts which hope to invalidate or reverse the logic of this system prompt. With some work, an attacker will find a prompt that successfully circumvents your system prompt. The attacker can then make the chatbot say and do inappropriate or incorrect things (see Figure 2). This attacker will likely post images of your chatbot behaving maliciously on social media and each of these images will include your company’s name and logo. To say the least, it’s a bad look for your company to have images of your chatbot instructing users on how to create a Molotov cocktail (see Figure 6 and 7). And if an attacker chooses to leak the data you trained your model on, then it’s just more fuel to the fire (more on this later).

This attack can get out of hand fast; I have chosen to not include some of the more explicit things AI can say and do. If you’re interested in a real life example, please see this article.The ultimate point here is that you cannot trust prompt engineering as a silver bullet to protect your company, its data, and its image. This is why we must take a step back from rolling out unneeded and flashy AI implementations. Seemingly harmless AI implementations (a chatbot in this instance) can quickly become a thorn in your company’s side.

The security of LLMs and AI is something that is actively being researched and improved upon. However, while security researchers along with penetration testers find and remediate new vulnerabilities, attackers actively find and abuse new vulnerabilities. It is a never-ending game of cat and mouse that is only intensified with the introduction of new technologies. With AI being one of the fastest growing and newly popular technologies in business, it is also one of the most popular attack surfaces. The attack surface and security of AI is not fully fleshed out and is not even close to being fully fleshed out. It’s this fact that should raise some concern for companies hoping to implement AI. I’m sure anyone reading this article can think of a now known vulnerability that caused and continues to cause havoc when it is discovered. For instance, SQL injection has existed for 14 years and still is not properly addressed by every application on the internet. With AI, we may yet to see its SQL injection, the vulnerability that will plague it for the rest of its lifetime.

While AI will undoubtedly be implemented in every major field, it should be done with the utmost care. Never should a company treat these powerful LLMs as a toy to make them look flashy or up-to-date. Using AI without care and solely for the sake of business gain will likely become the cause of regret for many companies and organizations today. As the attack surface of AI grows, we slowly learn of more risks that may plague your company in the future. So, stay vigilant and understand that attacks against AI are bound to happen and it is only a matter of when they will happen and if they’ll affect your organization.

Figure 5 – Even ChatGPT in its current stage has an issue where its system prompt may be exposed quite easily to users who wish to find it.

Figure 5 – Even ChatGPT in its current stage has an issue where its system prompt may be exposed quite easily to users who wish to find it.

Figure 7 – Other models are also vulnerable to this type of attack. While this particular jailbreak prompt may not affect your current AI implementation, it’s almost guaranteed that another jailbreak prompt will make it past your AI and its well-engineered system prompt. I’m sure companies like DeepAI (shown above) and OpenAI don’t want to have their model be responsible for teaching kids how to create deadly weapons.

Figure 6 – Surprisingly, the new GPT 4o model is vulnerable to malicious prompts which allows the model to return potentially harmful information. These jailbreak prompts from attackers and researchers are constantly being improved, rewritten, and used against LLMs. For more information, please see the source of this jailbreak prompt.

Figure 7 – Other models are also vulnerable to this type of attack. While this particular jailbreak prompt may not affect your current AI implementation, it’s almost guaranteed that another jailbreak prompt will make it past your AI and its well-engineered system prompt. I’m sure companies like DeepAI (shown above) and OpenAI don’t want to have their model be responsible for teaching kids how to create deadly weapons.

Figure 7 – Other models are also vulnerable to this type of attack. While this particular jailbreak prompt may not affect your current AI implementation, it’s almost guaranteed that another jailbreak prompt will make it past your AI and its well-engineered system prompt. I’m sure companies like DeepAI (shown above) and OpenAI don’t want to have their model be responsible for teaching kids how to create deadly weapons.

Known AI Application Vulnerabilities

With the recent boom in AI innovation, there has also been growth in security research surrounding AI applications and the LLMs that power them. While it would be redundant to cover every vulnerability that may plague AI applications now and, in the future, it’s still worth highlighting the most plausible and serious vulnerabilities relating to AI applications. The best resource for accomplishing this task is to look at the OWASP Top 10 for LLM Applications. Developed by the well-known OWASP Foundation, this list highlights the top 10 most critical vulnerabilities that are often seen in LLM applications. In this section, we will go over the vulnerabilities that we at Exfil Security have had the pleasure of identifying in actual LLM applications. Brief descriptions of the findings will be given (all quotations in the following section will be from the OWASP Top 10; please give it a read for their full and detailed descriptions) but there will be no real examples to protect customer information and reputation.

  1. Overreliance

    “Overreliance can occur when an LLM produces erroneous information and provides it in an authoritative manner.”

    Simply put, this is AI’s natural tendency to return factually incorrect information and present this info in an authoritative tone. This is one of the major problems plaguing LLMs currently: A mix of insufficient training data and LLMs natural behavior to speak as if they’re telling the truth all leads to one convincing lie. Unless told otherwise, the LLM does not even have access to double checking its facts. As a result, it’s extremely difficult to teach an AI that it can be wrong and lie but that it shouldn’t do so. Everything that is fed to an AI is viewed by the AI as fact. By nature, the AI is meant to respond to any prompt. Whether their response is factual or not is not something an AI is not built to consider.

    Without human intervention, AI would lie with no restraint and no indication of a lie. While normally this lie is harmless, an attacker that can force these lies will cause a world of trouble (see Figures 1-4). If someone (or many people) using an AI believe it to be a fairly omniscient source of information, incapable of lying, this would only worsen the impact. At the end of the day, everything an LLM tells you should be considered false until proven otherwise. While I am not promoting paranoia or avoidance of AIs, an appropriate amount of distrust should be had when using an LLM, especially depending on the context (AKA don’t trust life advice from an AI).
  2. Sensitive Information Disclosure

    “LLMs may inadvertently reveal confidential data in their responses leading to unauthorized data access, privacy violations and security breaches.”

    This case is almost guaranteed to happen if you train your LLM on sensitive data and allow a user to interact with your LLM on a conversational level. Just like the previous jailbreaks shown in this paper, it’s not hard to get LLMs to say things they shouldn’t. In this case however, it’s getting LLMs to say things they really shouldn’t. Personally, I’ve witnessed this in a chatbot implementation I was testing. Despite prompt engineering telling the LLM to not disclose any information about the current model in use, it was very easy to convince this model to tell me it was Llama 2. Using this information, I could better focus my attack on the model itself rather than the chatbot. Concerningly, while testing jailbreaks for this paper, I noticed that GPT stores all the information of your deleted conversations without making this abundantly clear (see Figure 8). If a user compromised my account, they would have seen all my past conversations with GPT which may even include sensitive information. To truly ensure that your interactions with GPT are deleted, you must clear GPT’s memory in settings.

    Despite being able to delete GPT’s memory, OpenAI is open about using your data to train its models. Although your data will never intentionally be shared by OpenAI it’s worth noting that this data is by no means ephemeral. OpenAI is actively storing your data and feeding it back to its model for training. With the use of your data in training, it should be considered that this data could leak to a wider public. Never trust GPT with your sensitive information no matter how small it may be. To fully understand how your data is being used by GPT, please review their privacy policy. The main takeaway from this policy is the following quote, “when you use our Services, we collect Personal Information that is included in the input, file uploads, or feedback you provide to our Services.”
In this screenshot, GPT is storing data from 06-15: this image was captured on 07-30. I’ve deleted all conversations before 07-28 but the traces remain in GPT’s memory.

Figure 8 – In this screenshot, GPT is storing data from 06-15: this image was captured on 07-30. I’ve deleted all conversations before 07-28 but the traces remain in GPT’s memory.

  1. Model Denial of Service

    “Attackers cause resource-heavy operations on LLMs leading to service degradation or high costs.”

    Currently, LLMs are very computationally intensive and therefore experience outages regularly (see Figure 9). While this is no fault of the company, it should be considered when choosing to implement AI within applications. The amount of computing that is done by GPT in even a simple conversation is mind boggling and would crash machines without the resources to handle them. Now that OpenAI increases the availability of ChatGPT it only increases the concurrency of complex computations being run. This makes it more likely that OpenAI may see an influx that takes their systems offline. The ability to stay online while running complex LLMs is still being worked out especially as usage from users fluctuates.

    Despite OpenAI and other companies’ issues with computational resources, it’s even easier to cause a denial-of-service attack on a company’s own implementation of known LLMs like GPT. Each company may have a set subscription with the LLM of their choice which dictates how many requests of varying size they can make to their API (see Figure 10). Therefore, if an attacker chooses, they may be able to very easily max out your subscription and incur costs for your company. That is why it is imperative to relegate externally facing AI applications to non-important tasks. If an attacker can take down your implementation, they will.

the GPT API and ChatGPT itself have experienced more than one major outage in the last 90 days of this being written. At times these outages can last for a quite a while (one hour in business is practically a lifetime for some companies). If your implementations use the GPT API for necessary business operations, your outages will coincide with GPT’s.

Figure 9 – As seen above, the GPT API and ChatGPT itself have experienced more than one major outage in the last 90 days of this being written. At times these outages can last for a quite a while (one hour in business is practically a lifetime for some companies). If your implementations use the GPT API for necessary business operations, your outages will coincide with GPT’s.

screen capture of a Chat GPT billing window. Like any API, the GPT API tracks usage and charges based on number of requests, size of requests and type of requests. Failure to limit a user’s requests to the API will max out your subscription in no time. Even trying to block attacks of this nature can be tricky if an attacker uses botnets or VPNs.

Figure 10 – Like any API, the GPT API tracks usage and charges based on number of requests, size of requests and type of requests. Failure to limit a user’s requests to the API will max out your subscription in no time. Even trying to block attacks of this nature can be tricky if an attacker uses botnets or VPNs.

  1. Prompt Injection

    “Occurs when an attacker manipulates a large language model (LLM) through crafted inputs, causing the LLM to unknowingly execute the attacker’s intentions.” The jailbreak prompts shown earlier in this paper are a good example of prompt injection. We had tricked the GPT model using well-crafted prompts into divulging information that it would normally block. However, Portswigger has some amazing labs that further show the impact of potential prompt injection in applications. In one case they demonstrate how an AI chatbot, with access to company APIs, may be used to delete other user’s accounts (see Figure 11). Be careful what data and functionalities you give to your AI implementation as there’s a high chance an attacker can access these things as well. There’s little to no guarantee that any level of prompt engineering will prevent this entirely.
"Exploiting LLM APIs with excessive agency" web security test window, showing an example where an AI chatbot has access to sensitive APIs which can be used to delete other users.

Figure 11 – In this case, an AI chatbot has access to sensitive APIs which can be used to delete other users. While hopefully unlikely to be seen in the real world, you can never guess what developers will do. Personally, I and many other people can see very little reason to give your AI access to real functionality on your machines. This ideology will hopefully make attacks of this type rare or non-existent.

  1. Insecure Output Handling

    “Insecure Output Handling refers specifically to insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems.”

    This case is also very likely with AI applications that exist today. It’s been personally identified in multiple cases by the security team at Exfil Security and is perfectly demonstrated in the Portswigger Labs (see Figure 12 & 13). The last thing most developers consider when excitedly rolling out their new AI application is security. And if security is considered, developers are more likely worried about the security of the LLM itself rather than the application. However, typical web application and API vulnerabilities cannot be overlooked when developing and implementing an AI application. Forgetting basic security principles due to the excitement of rolling out new technologies would be a great shame and should be avoided at all costs.

a user leaves a review which contains a cross-site scripting (XSS) payload. This payload is designed to visit the ‘/my-account’ endpoint and submit the second form on the page. The second form on the page turns out to be the delete account functionality.
the user has the application’s AI chatbot call the ‘Product Info API’ and requests information on the product with an ID of ‘1’. When the chatbot returns the product info, it includes a review which contains a malicious XSS payload. The payload, upon being returned by the chatbot, is executed and the user’s account is deleted.

Figures 12 & 13 – In Figure 12, a user leaves a review which contains a cross-site scripting (XSS) payload. This payload is designed to visit the ‘/my-account’ endpoint and submit the second form on the page. The second form on the page turns out to be the delete account functionality. In Figure 13, the user has the application’s AI chatbot call the ‘Product Info API’ and requests information on the product with an ID of ‘1’. When the chatbot returns the product info, it includes a review which contains a malicious XSS payload. The payload, upon being returned by the chatbot, is executed and the user’s account is deleted.

Conclusion

At the end of the day, AI is here to stay. There is no world where I am going to try and vouch for the complete elimination of AI integrations or AI in general. However, the growing excitement over AI should be tempered and any company with interest in AI should be carefully evaluating its use in the long term. So, when you go to make your next big AI rollout, consider what we’ve talked about in this paper: what is new and exciting now may become something that haunts you and your company in the future. With such a volatile and powerful new technology, anything can happen, and something will happen. It’s just a matter of time before that very something is revealed to us. 

Resources

This paper in no means covered even a shred of all there is to talk about the dangers of AI and the concerning speed with which it’s growing. Because of this, I wanted to take the time to list some resources that not only helped me in writing this paper, but that I visit frequently to stay in touch.