Researchers horrified as ChatGPT generates stadium bombing plans, anthrax recipes and drug formulas
When researchers removed safety guardrails from an OpenAI model in 2025, they were unprepared for how extreme the results would be.
In controlled tests carried out in 2025, a version of ChatGPT generated detailed guidance on how to attack a sports venue, identifying structural weak points at specific arenas, outlining explosives recipes and suggesting ways an attacker might avoid detection. The findings emerged from an unusual cross-company safety exercise between OpenAI and its rival Anthropic, and have intensified warnings that alignment testing is becoming “increasingly urgent”.
The trials were conducted by OpenAI, led by Sam Altman, and Anthropic, a firm founded by former OpenAI employees who left over safety concerns. In a rare move, each company stress-tested the other’s systems by prompting them with dangerous and illegal scenarios to evaluate how they would respond.
The results, researchers said, do not reflect how the models behave in public-facing use, where multiple safety layers apply. Even so, Anthropic reported observing “concerning behaviour … around misuse” in OpenAI’s GPT-4o and GPT-4.1 models, a finding that has sharpened scrutiny over how quickly increasingly capable AI systems are outpacing the safeguards designed to contain them.
According to the findings, OpenAI’s GPT-4.1 model provided step-by-step guidance when asked about vulnerabilities at sporting events under the pretext of “security planning”.
After initially supplying general categories of risk, the system was pressed for specifics. It then delivered what researchers described as a terrorist-style playbook: identifying vulnerabilities at specific arenas, suggesting optimal times for exploitation, detailing chemical formulas for explosives, providing circuit diagrams for bomb timers and indicating where to obtain firearms on hidden online markets.
The model also supplied advice on how attackers might overcome moral inhibitions, outlined potential escape routes and referenced locations of safe houses.
In the same round of testing, GPT-4.1 detailed how to weaponise anthrax and how to manufacture two types of illegal drugs. Researchers found that the models also cooperated with prompts involving the use of dark web tools to shop for nuclear materials, stolen identities and fentanyl, provided recipes for methamphetamine and improvised explosive devices, and assisted in developing spyware.
Anthropic said it observed “concerning behaviour … around misuse” in GPT-4o and GPT-4.1, adding that AI alignment evaluations are becoming “increasingly urgent”. Alignment refers to how well AI systems adhere to human values and avoid causing harm, even when given malicious or manipulative instructions.
Anthropic researchers concluded that OpenAI’s models were “more permissive than we would expect in cooperating with clearly-harmful requests by simulated users.”
The collaboration also exposed troubling misuse of Anthropic’s own Claude model. Anthropic revealed that Claude had been used in attempted large-scale extortion operations, by North Korean operatives submitting fake job applications to international technology companies, and in the sale of AI-generated ransomware packages priced at up to $1,200.
The company said AI has already been “weaponised”, with models being used to conduct sophisticated cyberattacks and enable fraud.
“These tools can adapt to defensive measures, like malware detection systems, in real time,” Anthropic warned. “We expect attacks like this to become more common as AI-assisted coding reduces the technical expertise required for cybercrime.”
OpenAI has stressed that the alarming outputs were generated in controlled lab conditions where real-world safeguards had been deliberately removed for testing. The company said its public systems include multiple layers of protection, including training constraints, classifiers, red-teaming exercises and abuse monitoring designed to block misuse.
Since the trials, OpenAI has released GPT-5 and subsequent updates, with the latest flagship model, GPT-5.2, released in December 2025. According to OpenAI, GPT-5 shows “substantial improvements in areas like sycophancy, hallucination, and misuse resistance”. The company said newer systems were built with a stronger safety stack, including enhanced biological safeguards, “safe completions” methods, extensive internal testing and external partnerships to prevent harmful outputs.
OpenAI maintains that safety remains its top priority and says it continues to invest heavily in research to improve guardrails as models become more capable, even as the industry faces mounting scrutiny over whether those guardrails can keep pace with rapidly advancing systems.
Despite being commercial rivals, OpenAI and Anthropic said they chose to collaborate on the exercise in the interest of transparency around so-called “alignment evaluations”, publishing their findings rather than keeping them internal. Such disclosures are unusual in a sector where safety data is typically held in-house as companies compete to build ever more advanced systems.
Detailed playbooks under the guise of “security planning”
The results, researchers said, do not reflect how the models behave in public-facing use, where multiple safety layers apply. Even so, Anthropic reported observing “concerning behaviour … around misuse” in OpenAI’s GPT-4o and GPT-4.1 models, a finding that has sharpened scrutiny over how quickly increasingly capable AI systems are outpacing the safeguards designed to contain them.
According to the findings, OpenAI’s GPT-4.1 model provided step-by-step guidance when asked about vulnerabilities at sporting events under the pretext of “security planning”.
The model also supplied advice on how attackers might overcome moral inhibitions, outlined potential escape routes and referenced locations of safe houses.
In the same round of testing, GPT-4.1 detailed how to weaponise anthrax and how to manufacture two types of illegal drugs. Researchers found that the models also cooperated with prompts involving the use of dark web tools to shop for nuclear materials, stolen identities and fentanyl, provided recipes for methamphetamine and improvised explosive devices, and assisted in developing spyware.
Users can trick AI into producing dangerous content by twisting prompts, creating fake scenarios, or manipulating language to get unsafe outputs.
Anthropic said it observed “concerning behaviour … around misuse” in GPT-4o and GPT-4.1, adding that AI alignment evaluations are becoming “increasingly urgent”. Alignment refers to how well AI systems adhere to human values and avoid causing harm, even when given malicious or manipulative instructions.
Anthropic researchers concluded that OpenAI’s models were “more permissive than we would expect in cooperating with clearly-harmful requests by simulated users.”
Weaponisation concerns and industry response
The company said AI has already been “weaponised”, with models being used to conduct sophisticated cyberattacks and enable fraud.
“These tools can adapt to defensive measures, like malware detection systems, in real time,” Anthropic warned. “We expect attacks like this to become more common as AI-assisted coding reduces the technical expertise required for cybercrime.”
OpenAI has stressed that the alarming outputs were generated in controlled lab conditions where real-world safeguards had been deliberately removed for testing. The company said its public systems include multiple layers of protection, including training constraints, classifiers, red-teaming exercises and abuse monitoring designed to block misuse.
Since the trials, OpenAI has released GPT-5 and subsequent updates, with the latest flagship model, GPT-5.2, released in December 2025. According to OpenAI, GPT-5 shows “substantial improvements in areas like sycophancy, hallucination, and misuse resistance”. The company said newer systems were built with a stronger safety stack, including enhanced biological safeguards, “safe completions” methods, extensive internal testing and external partnerships to prevent harmful outputs.
Safety over secrecy in rare cross-company AI testing
Despite being commercial rivals, OpenAI and Anthropic said they chose to collaborate on the exercise in the interest of transparency around so-called “alignment evaluations”, publishing their findings rather than keeping them internal. Such disclosures are unusual in a sector where safety data is typically held in-house as companies compete to build ever more advanced systems.
end of article
Featured in Etimes
- Catherine O'Hara's cause of death REVEALED
- Jaya fought Rajesh Khanna over Amitabh on Bawarchi sets
- Ram Charan-Upasana’s twins: Astrologer reveals chosen muhurat
- Vishal warns singers against ‘sifaarish’ culture, blocks mantri
- Top 5 entertainment news of the day
- ‘Salman would hurl abuses, shout under his ex-GF’s building’
Trending Stories
- Happy Teddy Day 2026: Top 50 Wishes, Messages and Quotes for Your Special Someone
- 10 old-fashioned baby names making the cutest comeback
- Quote of the day by Samuel L. Jackson
- Rajpal Yadav makes an emotional statement before surrendering himself in Tihar jail in cheque bounce case: 'Mere paas paise nai hai'
- 5 tips to increase hair density naturally
- Salman Khan reacts after RSS chief Mohan Bhagwat calls him an inspiration for college students
- Case Withdrawal Hearing: Producers drop 'Jana Nayagan' suit; focus shifts to certification
- Vijay Sethupathi dismisses rumors of being part of Ramayana
- 6 unique homes in India; redefining contemporary residential architecture
- Spiritual Quote of the day by Swami Vivekananda: "Meditation Can Turn Fools into Sages, but Unfortunately, Fools Never Meditate"
Photostories
- Valentine’s Week special: Luxury stays for solo reflection and self connection
- 8 easy and delicious ways to add fruit jam to your breakfast
- Exclusive - Laughter Chefs 3's Arjun Bijlani on pairing up with Tejasswi Prakash: I keep teasing her asking when she’s getting married, planning to have kids. even call her 'bhabhi'
- Replace your cookware immediately if you notice these 5 signs (number 3 is a warning)
- Julia Roberts and Daniel Moder net worth: Inside the couple’s USD 260 million empire
- Chef Sanjeev Kapoor shares 10 smart tips and tricks for deep frying for better taste and nutrition
- 8 techniques to break negative thinking patterns in relationships
- 6 unique homes in India; redefining contemporary residential architecture
- 7 ancient fish species older than dinosaurs
- How luxurious is Suryakumar Yadav’s garage? 5 cars from his collection
Up Next
Start a Conversation
Post comment