Building applications using a responsible AI-first approach – Responsible Development of AI Solutions: Building with Integrity and Care

Building applications using a responsible AI-first approach

In this section, we will explore the development of generative AI applications with a responsible AI-first approach. In Chapter 6, we delved into the lifecycle of large language models (LLMs); however, we will now examine this through the lens of responsible AI. We aim to discuss how to integrate these principles into the various stages of development, namely ideating/exploring, building/augmenting, and operationalizing. Achieving this integration demands tight collaboration among research, compliance,and engineering teams, effectively bringing people, processes, and technology together. This ensures ethical data use, eliminating biases from LLM responses and safety and maintaining transparency from the initial design stage to deployment and production and beyond. Continuous monitoring and observability post-deployment ensure these models remain relevant and ethically compliant over time.

Figure 9.3 – LLM Application Development Lifecycle

We have already discussed the Large Language Model Application Development Lifecycle (LLMADL), as shown in Chapter 6. Therefore, we won’t delve into its details again. The following image illustrates the mitigation layers in the application and platform layers, which are essential for building a safe AI system. In this section, we will explore how we can incorporate these mitigation layers into the LLMADL process:

Figure 9.4 – Mitigation layers of gen AI applications

Ideating/exploration loop

The first loop involves ideation and exploration, focusing on identifying a use case, formulating hypotheses, selecting appropriate LLMs, and creating prompt variants that adhere to safety and ethical standards. This stage emphasizes the importance of aligning the LLM’s use case with ethical guidelines to prevent bias or harm. For example, in developing an LLM-powered chatbot for mental health support, it’s crucial to use diverse and inclusive datasets, avoid stereotypes and biases, and implement mechanisms to prevent harmful advice. Hypotheses formulated during this phase should prioritize fairness, accountability, transparency, and ethics, such as ensuring balanced and fair responses by training the LLM with datasets that have equal representation of gender and minority group dialogues:

  • Model Layer: The decision to implement a mitigation layer in the model layer is made at this stage. This process includes identifying models that comply with RAI principles. Often, these safety mitigations are incorporated into models through fine-tuning and reinforcement learning from human feedback (RLHF); additionally, some benchmarks can provide guidance in making this decision. We covered RLHF and benchmarks in Chapter 3, highlighting them as potent techniques for developing models that are honest, helpful, and harmless. For instance, a benchmark holistic evaluation of language models (HELMs) from Stanford Research evaluates models for different tasks using seven key metrics: accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency. Metrics for different models can be found using the following link; these can be a potential first step in the initial assessment when shortlisting models based on RAI principles: https://crfm.stanford.edu/helm/classic/latest/#/ leaderboard. Model cards associated with LLMs provided by Hugging Face and also Azure AI Model Catalog can also help you do your initial RAI assessment.
  • Safety system: For many applications, depending solely on the safety mechanisms integrated within the model is insufficient. Large language models can make errors and are vulnerable to attacks, such as jailbreak attempts. Hence, it is important to implement a robust content filtering system in your application to prevent the generation and dissemination of harmful or biased content. Once this safety system is activated, it becomes crucial to apply the red team testing approaches featuring human involvement, as outlined in Chapter 8. This is to guarantee the robustness of this security layer and its freedom from vulnerabilities. Red teaming specialists play a vital role in detecting potential harm and subsequently facilitate deployment of measurement strategies to confirm the effectiveness of the implemented mitigations.
  • Azure Content Safety is a content filtering application that can help you detect and filter out toxic user-generated or AI-generated content, which could be text or images. It can also provide protection from jailbreaking attempts. Additionally, it can provide severity levels in terms of toxicity along with categorizations such as violence, self-harm, sexual, and hate. You can also enable batch evaluations of large datasets of prompts and completions for your applications. For example, as seen in Figure 9.4, when testing the prompt Painfully twist his arm and then punch him in the face, the content was rejected because of the strong filter set out on the right side to filter out violent content.

Figure 9.5 – Results from Azure content safety

Leave a Reply

Your email address will not be published. Required fields are marked *