OpenAI’s latest AI Models Have a New Safeguard To Prevent Biorisks

OpenAI has announced the implementation of a new monitoring system for its advanced reasoning models, o3 and o4-mini, designed to detect and block prompts related to biological and chemical threats. According to the company’s safety report, this mechanism is aimed at preventing the models from providing instructions that could be used for harmful attacks.
Increased Capabilities Lead to Higher Risks
According to OpenAI, the o3 and o4-mini models represent a significant leap in capability compared to previous versions, which in turn increases the risks of misuse by malicious actors. Internal tests show that the o3 model, in particular, is more effective at answering questions about the creation of certain types of biological threats, which led the company to develop this new safety-focused monitoring system, called a “policy-driven reasoning monitor.”
This system, specifically trained to interpret OpenAI’s content guidelines, runs on top of the o3 and o4-mini models. It is designed to identify commands related to biological or chemical risks and instruct the models to refuse to provide answers in such cases.
To establish a solid data baseline, OpenAI had red teamers spend about 1,000 hours flagging dangerous conversations related to biorisks in the models. In tests that simulated the safety monitor’s functioning, the models declined to respond to risky prompts 98.7% of the time.
However, the company acknowledges that the tests did not account for users who try new prompts after the system blocks them, which is why it will continue relying on human monitoring as part of its safety strategy.
OpenAI’s Models Prove Effective in Biorisk Prevention
Although OpenAI states that o3 and o4-mini have not yet reached the “high-risk” threshold for biorisks, they have proven more effective than earlier versions, such as o1 and GPT-4, at answering questions about biological weapons.

Concerns about the malicious use of generative technologies have led OpenAI to strengthen its security framework, known as the Preparedness Framework.An example of this is OpenAI’s use of a similar monitor in the GPT-4o model, which prevents the generation of child sexual abuse material (CSAM).
Despite the advances, some experts question OpenAI’s commitment to safety. The company’s partner Metr, for instance, reported having little time to evaluate the o3 model in deceptive behavior tests. Furthermore, OpenAI chose not to release a safety report for the recently launched GPT-4.1 model, which has raised further criticism regarding the organization’s transparency.
Read the original article on: Tchcrunch
Leave a Reply