This post is dedicated to the essential, albeit somewhat un-sexy, topic of AI brand safety. Brand safety comes up in just about every project we work on. And with so much media attention on AI’s promise and perils, it’s top of mind for those in charge of mitigating risk within their orgs.
Brand safety is not a black-and-white issue; it’s a spectrum of risk. In every case, the risk level is a byproduct of the use case and how AI is utilized.
In these instances, the key is to align the inherent risk in your AI system with the amount of risk your organization or client is willing to take on. This comes down to design decisions.
To demonstrate the nuance of this, I’m going to map out three real-life scenarios. Each scenario is a real-life project that has gone through vetting and approval from the legal department within a Fortune 500 Company. To avoid divulging too many internal details, I’ll speak to each scenario at a high level and avoid using specific company names.
We’ll explore the scenarios in order of risk level so you get a feel for how this works.
Case Study 1: AI Insights
We recently created an AI system for one of our clients that used 1st party market research to generate on-demand insights. A few mitigating factors made this a relatively low-risk use of Generative AI. The system was grounded in the client’s 1st party research. And because it’s a tool for internal use, it always had an employee in the loop to check the results.
We still took additional steps to ensure the system was safe. We wanted to ensure this system had enough interpretability to avoid a scenario where our client could make a bad-strategic decision because it acted on bad advice. To address this risk, we incorporated an AI function that “fact-checks” its outputs against the 1st party research and provides citations for a human being to do the same.
A further concern with this use case was data privacy. Could the client’s research become embedded in the AI's neural network? Because of the models we used (GPT-4) the answer here is a no. However, as a security precaution, we still took steps to remove all PII from the data set before being uploaded into the AI system.
Case Study 2: AI Content
We recently created an AI system to generate thousands of personalized content deliverables. These text-based snippets would display to app users based on their in-app activity. Because this use case involved delivering AI-generated content to end-users, the risks are higher than in the previous use case. The content that the AI generated would ultimately be displayed to end-consumers.
Our system was designed to help mitigate risk and ensure the content was safe and accurate. For one thing, the AI was taught to generate content based on the brand’s source materials, guidelines, and creative examples. As a result, the chances of our system generating anything that contained someone else's IP, or something inappropriate, was close to zero.
Still, we wanted to ensure the content was accurate and complied with the company’s guidelines. To facilitate this, we created two layers of safety checks. The first was a machine layer that used a Generative AI function to review the system’s outputs for factuality and adherence to the brand’s guidelines. The second was a human layer consisting of the brand’s compliance team, which manually reviewed all outputs before the content went live.
Case Study 3: AI Experience
The third and final use case we’ll look at is a recent project we launched that enabled customers to generate sharable AI-generated visuals through a series of text prompts.
Since the system was public-facing, the stakes were high. Not only could the system accidentally generate an undesirable output, but some people would likely attempt to jailbreak it by tricking it into generating something inappropriate. To address these risks, we employed two approaches.
The first approach was to force the AI to focus on a narrow use case. While the model we were using under the hood for image generation (Stable Diffusion) is capable of generating a wide range of content, some of which would not be appropriate for a brand to put out, we designed the system in such a way that it forced the outputs to align with the system’s intentions.
Beyond this, we incorporated an additional AI safety function, which leveraged an AI language model (GPT-4) to police the user's inputs and reject instructions that were not brand safe. This prevented users from generating content that featured celebrities, other brands, and NSFW topics. The AI system was then extensively tested through "Red Teaming," where a diverse group of testers, including QA engineers and members of the client's team attempted to jailbreak the system for several days. After each round, the findings were used to refine the brand safety function until the team was comfortable deploying it.
Further Thoughts
AI Brand Safety is an evolving field and something that we continue to wrestle with as the broader field of AI evolves. We’ve been exploring other concepts – for example, the “Duck Hunt” approach, where an AI function reads completions on the fly and “shoots them out of mid-air” if something goes wrong. Another concept we have employed recently is to use an AI model such as GPT-4 to synthetically “red team” a brand safety filter. By providing detailed instructions about how your safety filter works, AI models can generate thousands of prompts that are intended to get around it, enabling you to automate the process of “red teaming” with considerable scale.
Conclusion
While brand safety needs be taken seriously, it's also a challenge that needs to be addressed creatively. Different use cases come with different inherent risks. And how you design an AI system around a use case can help mitigate the associated risks. To pursue AI initiatives responsibly, organizations can use design techniques to address safety concerns head-on.
Addition is an AI research and development company for the brands of tomorrow.
Visit our website to learn about the work we do with brands and agencies
Read about us in the Wall Street Journal
Follow us on Twitter