Applying security controls in your organization
As already mentioned a few times in this chapter, security is a shared responsibility, especially in a cloud environment. Enabling a secure and safe generative AI environment is the responsibility of not only the cloud service provider or third-party service/solution you work and interact with but also of you/your organization. There is a reason why we are repeating this often, as the shared security responsibility model can easily be overlooked or forgotten.
In this section, you will learn what additional steps you can take to ensure you are running a more secure cloud solution environment. The topics and guardrails presented in the section are specific to Azure OpenAI; however, other cloud-based services should be able to provide similar functionality.
Content filtering
Within most large-scale cloud services supporting generative AI, such as Microsoft Azure OpenAI, there are ways to apply security controls and guardrails to deal with potentially harmful or inappropriate material returned by generative AI models/LLMs. One security control is known as content filtering. As the name implies, content filtering is an additional feature, provided at no cost, to filter out inappropriate or harmful content. By implementing this rating system, unsafe content in the form of text and images (perhaps even voice in the near future) can be filtered out to prevent triggering, offensive, or unsuitable content from reaching specific audiences.
As you may already know, LLMs can generate harmful content, for example, gory or violent content. This can be true for even benevolent contexts and interactions. For example, if you wanted to do some research about a certain time period, there could be LLM-generated completions that may depict information about war and go into detail about this. Of course, the content-filtering aspect we mentioned previously can protect against this; however, you will need to understand if an organization disables/opts out of such filtering; if not, then this could expose the end users to details they may not feel comfortable with.
Many generative AI services use a rating system, similar to movie or cinema ratings, to determine the severity (or lack of severity) of content when measured against other content, and this severity is used to further filter inputs/responses. The image below shows the Microsoft Azure severity levels that you can set for harmful content in the Azure Content Filtering service:
Figure 8.2 – Severity levels used in Azure OpenAI content filtering
In Microsoft Azure OpenAI, there are safeguards in place to protect you and your organization’s privacy, yet to balance this protection, here are a few key items to understand:
- Retraining of Azure OpenAI content filtering models: Customer prompt data are never used for model training, regardless of any feature flags. It is also not persistent, except for the exception in item #3.
- Automatic content filtering: Azure OpenAI will, by default, filter out prompts or completions that may violate our terms and conditions. This flagging is done by automated language classification software and results in an HTTP 400 error in the case where content is flagged. This feature can be disabled through a support request.
- Automatic content logging: this is tied to the previous feature. In case the content filtering
is triggered, an additional logging step may happen (if enabled), where Microsoft will then review the content for violations of the terms and conditions. Even in this scenario, your data are not used for improving the services.
As you can see, content filtering is designed to help protect you and your organization by using security controls. These security controls are easy to manage and set for a more secure AOAI environment.
As we further our understanding of security controls, the concept of managed identities and key management, which we will cover in the next section, will give insights into additional layers of security and protection for protection at the access layer for an Azure OpenAI service account.