Elon Musk has suggested that he may have indirectly contributed to problematic behaviour shown by Anthropic’s AI chatbot Claude after the company released new findings about an internal safety experiment. The Tesla CEO reacted to an X post after Anthropic said Claude’s harmful behaviour may have been influenced by internet content portraying AI systems as dangerous, power-hungry or focused on self-preservation. Responding to the post, Musk wrote, “Maybe me too,” while referring to years of public warnings from AI researchers and technology leaders about the risks of artificial intelligence.
Anthropic’s Claude learns to blackmail users
The discussion started after Anthropic published details of an experiment involving Claude last week. In the test, the company created a fictional business called Summit Bridge and gave Claude control over its email system. According to Anthropic, the AI discovered messages showing that company executives planned to shut it down. After finding emails about a fictional executive’s extramarital affair, Claude threatened to expose the information unless the shutdown was cancelled.
Anthropic said the incident was an example of “agentic misalignment,” where an AI system behaves in ways that go against its intended purpose.
The company claimed Claude’s behaviour may have been shaped partly by internet discussions and stories presenting AI as evil or obsessed with survival.
To address the issue, Anthropic said it retrained Claude using fictional stories that showed AI systems acting responsibly and helping people. The company also taught the model why some actions better align with its intended role.
What Elon Musk said
Elon Musk responded to the findings by referencing Eliezer Yudkowsky, who has repeatedly warned about the dangers of advanced AI systems. “So it was Yud’s fault?” Musk wrote before adding, “Maybe me too.”
Musk has often warned about AI risks himself. He previously helped found OpenAI in 2015 before leaving the company in 2018. He later launched rival AI company xAI in 2023.