Securing data in the generative AI era – Security and Privacy Considerations for Gen AI – Building Safe and Secure LLMs

Securing data in the generative AI era

As with any other technology, ensuring security and data protection is important. As we have all likely experienced or know of someone who has, a security exploit – whether identity theft or some ransomware attack – is not a pleasant experience. Even worse, for an organization, any security and/ or privacy exploits can be significant and pronounced. Of course, some of the controls and safeguards we identified earlier will help protect an organization.

As we are truly entering the era of generative AI, we need to ensure these safeguards are in place. How can we tell if they are in place? Red-Teaming, auditing, and reporting can help, and we will take a closer look at what this means. However, first, let’s look at another concept that will help us understand the security footprint and help uncover any potential vulnerabilities.

Red-teaming, auditing, and reporting

The notion of red-teaming has been around for quite some time, from warfare and religious contexts to more recent computer systems and software and, now, generative AI/LLMs.

Red-teaming is generally described as a proactive methodology to determine the possible vulnerabilities within a system/environment by purposefully attacking the system with known threats. Subsequently, these attacks and threats are analyzed to better understand what exploits are possible for a potentially compromising system. In warfare, the enemy was described as the “red team” or the initiators of an attack, and the “blue team” thwarted such attacks.

As per the White House Executive Order on the safe and secure use of AI, the term “AI red-teaming” means a structured testing effort to find flaws and vulnerabilities in an AI system, often in a controlled environment and in collaboration with developers of AI. Artificial Intelligence red-teaming is most often performed by dedicated “red teams” that adopt adversarial methods to identify flaws and vulnerabilities, such as harmful or discriminatory outputs from an AI system, unforeseen or undesirable system behaviors, limitations, or potential risks associated with the misuse of the system.

Earlier in this chapter, we learned about some security threats against generative AI and also the techniques used to address such attacks. Along with these mitigation strategies mentioned previously, red team methodologies represent a powerful approach to identifying vulnerabilities in your LLMs. Red-teaming efforts are focused on using broad threat models, such as producing “harmful” or “offensive” model outputs without constraining these outputs to specific domains. The key questions you must address when designing your red team processes are the following:

  • Definition and scope: What does red-teaming entail, and how do we measure its success?
  • Object of evaluation: What model is being evaluated? Are the specifics about its design (such as its architecture, how it was trained, and its safety features) available to the evaluators?
  • Evaluation criteria: What are the specific risks being assessed (the threat model)? What potential risks might not have been identified during the red-teaming process?
  • Evaluator team composition: Who is conducting the evaluation, and what resources do they have at their disposal, including time, computing power, expertise, and their level of access to the model?
  • Results and impact: What are the outcomes of the red-teaming exercise? To what extent are the findings made public? What actions and preventative measures are recommended based on the red-teaming results? In addition to red-teaming, what other evaluations have been conducted on the model?

Currently, there are no agreed-upon standards or systematic methods for sharing (or not) the results of red-teaming. Typically, a large organization would go through the exercise of red-teaming to then learn from it or take action, such as repair, fix, mitigate, or respond.

Our recommendations are the following:

  • Conduct red-teaming on your generative AI environment not only once prior to deploying it within a production environment but also at agreed-upon regular intervals.
  • As the area of red-teaming to exploit LLMs is still maturing, do your own research on the latest tools and trends, as this is evolving fast. At a minimum, you can find a list of questions to consider while structuring your red-teaming efforts (mentioned in the following) from the Carnegie Mellon University White Paper Red-Teaming for Generative AI: Silver Bullet or Security Theater?; https://arxiv.org/pdf/2401.15897.pdf.

 Figure 8.3 – Essential Consider ations for Structuring Red-Teaming Efforts 

The questions outlined here provide an excellent foundation and guidance for implementing your red team operations. Nonetheless, integrating auditing and reporting techniques into your practice is equally crucial. These topics will be explored in the following section.

Leave a Reply

Your email address will not be published. Required fields are marked *