Ryan Beiermeister (OpenAI) “The two new OpenAI models allow content moderation and risk classification”

Ryan Beiermeister (OpenAI) “The two new OpenAI models allow content moderation and risk classification”

OpenAI is publishing gpt-oss-safeguard on October 29, two open-weight models (120b and 20b) based on reasoning. Interview with Ryan Beiermeister, VP of Product Policy at OpenAI.

JDN. OpenAI unveils gpt-oss-safeguard on October 29, two open-weight models (120b and 20b) specialized in security. What are the concrete use cases for these tools?

Ryan Beiermeister. Our goal is to give businesses and developers access to advanced and more flexible security technologies. Gpt-oss-safeguard is a set of tools that we initially designed for our own internal needs, including applying the reasoning capabilities of our models to security classification and interpretation of custom policies. By creating open-weight versions under the Apache 2.0 license, we want to allow everyone to adapt these tools to their own needs.

These models can be used for content moderation tasks, classification of emerging risks or even the protection of minors online. For example, a video game forum could develop a policy to identify discussions about cheating in a game, or a product review platform could design its own filter against fake reviews. More broadly, any organization looking to classify sensitive content, detect abuse or quickly adjust its policies can benefit from it.

How is your reasoning-based approach superior to current security systems, such as guardrails or classifiers?

What makes gpt-oss-safeguard more effective is that the system no longer needs to be trained on huge volumes of labeled data to distinguish what is acceptable or not. Traditional classifiers rely on thousands of predefined examples to learn to recognize dangerous content, making them expensive to train, difficult to scale, and often unable to handle infrequent risks. With our reasoning-based approach, the model can directly interpret the security policy written by the developer at the time of inference. It uses a technique called ‘chain-of-thought’ to analyze content, conversation or context.

“With our reasoning-based approach, the model can directly interpret the security policy written by the developer”

Above all, it can explain how it arrives at its conclusions, which is not possible with traditional classifiers. This transparency allows developers to understand why content has been classified a certain way and adjust their policies quickly if necessary. In short, this makes the system more scalable, less dependent on massive datasets, more transparent, and much easier to adapt to changing contexts or emerging risks.

Unlike classic guardrails which act upstream, how does gpt-oss-safeguard integrate into the generation process? Do you have an example to illustrate how it works?

Let’s take a concrete example: if someone manages to jailbreak a model to obtain instructions on making a weapon, gpt-oss-safeguard, which runs in real time, will be able to block the generation of this response before it reaches the user. Using our chain-of-thought reasoning technology, the model can explain how it arrived at this decision and what reasoning it used. This transparency allows developers to understand how their policy is applied and adjust it if necessary, without having to retrain a new model.

Clearly, this “inference-time” type operation makes it a system-level security mechanism, capable of intervening even if the main model has been bypassed. It can thus, if necessary, immediately block the output before it is presented to the user. These real-time classifications can also be used by Trust & Safety teams to monitor potential abuse and adjust their policies.

What was the nature of your collaboration with Roost and what was its role in this launch?

Roost is a key partner for us and OpenAI was one of the first private donors and supporters. Our missions are aligned, namely to make security accessible to the entire ecosystem, not just large companies. For gpt-oss-safeguard, we worked together on the documentation, testing and launch on Hugging Face. Roost is also leading the new community of developers dedicated to open-source security, with whom we will organize training and feedback sessions. In summary, Roost plays the role of facilitator and accelerator to disseminate these security tools on a large scale.

OpenAI has long favored closed models. Does this shift towards open-weight mark a change in strategy?

It is not a question of opposing open and closed models because the two approaches are complementary. Sam Altman, our CEO, has made it clear: it’s not one or the other. Open source promotes transparency, innovation and the democratization of AI, while some very powerful models still require access control and enhanced supervision. We also offer models accessible via API and closed models. Clearly, we are not at a stage where we see the world as a binary alternative, but rather as two options that meet different objectives. We’re very interested in open source because we remain deeply committed to ensuring that our best technologies benefit everyone, not just certain countries or companies that can afford licenses.

OpenAI has recently launched several initiatives in agentic AI, with Agent Builder and the Atlas browser. How do you adapt your product policies in such a changing environment?

A few years ago, ChatGPT was just a single chat interface. Today, we are building a complete ecosystem: web navigation, agents capable of performing tasks, video and image generation, programming assistance, etc. This forces us to think of product policy as an infrastructure, and no longer as rules specific to a single use case.

We define what remains true in all contexts. For example, we do not want our tools to be used to exploit children or plan violent acts. Then we translate these principles into concrete policies for each product. This may involve integrating them into model training, building system-level blocking systems like gpt-oss-safeguard, or adding monitoring mechanisms. So, if an agent starts to perform a dangerous action, our classifiers can detect it, block the action and suspend the account concerned.

Since its launch, Sora 2 has made it possible to create videos using protected trademarks or licenses. What measures are in place to protect copyright?

It all depends on the choice of the rights holder. Some partners are delighted to see their characters or brands used in creative ways, in positive contexts, which increase their visibility. Others do not want their works to be exploited, and they inform our legal team. We then quickly adapt our models to block this content. So we offer both options, and this flexibility seems to be well received. Regarding proof of ownership, we have an excellent legal team that carries out the necessary checks. Our goal is to offer different ways of using, distributing or protecting content according to the wishes of rights holders.

As VP of Product Policy, are you integrating employment implications into your product decisions, as younger generations seem to be worried about the impact of AI on the job market?

Of course. Overall, we believe that AI contributes positively to the labor market and will create jobs. AI helps people be more efficient and gives them truly powerful tools. However, we remain attentive to the question of equity. Personally, I want to ensure that our products do not favor certain populations to the detriment of others, and that the benefits are widely shared.

“We believe that AI contributes positively to the labor market and that it will create jobs”

We continually conduct economic research through a team dedicated to the impact of AI on the jobs market, because we believe it is important for society to understand these developments, as well as ourselves. Our goal is to increase productivity and help people thrive with AI, while limiting negative effects. These are difficult to predict, but we mainly focus on the material risks for people, namely their safety, psychological and emotional well-being.

Sam Altman recently opened the door to the potential erotic use of ChatGPT. How can these future new uses be reconciled with the protection of minors?

What Sam Altman expressed is that we should offer differentiated experiences according to the age of the users. Adults must be treated as adults, as long as their uses do not cause harm. We will always prohibit any use aimed at planning or encouraging violence or harming others or oneself. But for other forms of freedom, some people are in favor of it, others are not. We don’t want to be overly restrictive. What really matters is whether the user is an adult or a teenager.

We are therefore working on specific policies for minors to adapt the ChatGPT experience and make it safer. We are currently developing age prediction classifiers capable of estimating the user’s age and ensuring that anyone under the age of 18 is in a secure space. Once these systems are sufficiently efficient, we will be able to offer adults a distinct experience, while protecting young people. The common denominator will remain risk prevention and safety.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment