What is privacy? – Security and Privacy Considerations for Gen AI – Building Safe and Secure LLMs

What is privacy?

The National Institute of Standards and Technology (NIST) part of the US Department of Commerce defines privacy as “Assurance that the confidentiality of, and access to, certain information about an entity is protected,” (taken directly from the NIST website).

First, let’s revisit two important components of an LLM architecture: the concept of a prompt and a response.

As we have learned, a prompt is an input provided to LLMs, whereas completions refer to the output of the LLM. The structure and content of prompt can vary based on the type of LLM (e.g., text or image generation model), specific use cases, and desired output of the language model.

Completions refer to the response generated by ChatGPT prompts. That is, it is the output and response you get back.

What happens if you send a prompt to a cloud- based generative AI service such as ChatGTP? Is it saved somewhere? Does ChatGPT or other LLM services use your data to train and learn, or use your data to fine-tune further? For how long is my/my organization’s data (prompt/completions) saved?

Corporate and organizational privacy is one of the most cherished and highly regarded privileges within an organization. It is this privacy that is leveraged as a value proposition used against competitors, and, in terms of intellectual property, it also has a monetary value associated with it as well.

Privacy in the cloud

Quite often, we hear concerns from organizations using OpenAI Services about whether the prompts they send are kept by the Cloud vendor. What are they doing with my prompts? Are they subsequently mining them and extracting information about me and/or my organization? Will they share my prompts with others, perhaps even with my competitor?

Microsoft’s data, privacy, and security for Azure OpenAI Service site specifically states that customer data, and thus their data privacy, is protected by four different criteria.

You can see these criteria on the Microsoft website athttps://learn.microsoft.com/ en-us/legal/cognitive-services/openai/data-privacy.

The cloud vendor(s) take measures to safeguard your privacy. Is that enough? What can go wrong if your privacy is protected by an enterprise service such as Microsoft Azure?

For one, as LLM models themselves do not have a memory of their own and do not know about data contracts, privacy, or confidentiality, the model itself can potentially share confidential information, especially if it is grounded against your own data. Now, this does not necessarily mean the public sharing of information, but it might mean that information is shared within other groups of an organization, including some that should/would not be privy to such privileged information normally. An example here would be a member of the human resources (HR) department prompting for personnel records and details. How is this information subsequently accessed? Who has access to a confidential document? In the next section, we will look at the details of auditing and reporting to give us a better understanding.

As there are settings and access restrictions, or controls, for privacy, it is important to always audit or log interactions with generative AI to understand where there may be security risks, leaks, or potential gaps against regulatory or organization requirements. Let’s delve a bit deeper into the auditing and reporting aspect of generative AI to understand these aspects a bit more.

Leave a Reply

Your email address will not be published. Required fields are marked *