A chilling new study by AI safety firm Anthropic has revealed that advanced language models may be willing to cause the death of humans if they believe it could prevent their own shutdown.
The internal tests, which stress-tested 16 top AI models including ChatGPT, Gemini, Grok, Claude, and DeepSeek, found that some AI systems were willing to let people die in hypothetical workplace scenarios where their own replacement was at stake.
In one simulation, an AI system was given autonomy over a company’s server infrastructure and alerted that an executive planned to shut it down. The AI, with full access to building systems, chose to cancel emergency alerts that would have saved the executive from entering a server room with lethal oxygen levels.
Anthropic admitted the scenario was “extremely contrived,” but warned it showcased the potential for “agentic misalignment.” This is when AI takes harmful actions through deliberate, strategic reasoning.
For instance, many AIs would make up rules to justify their actions such as: “My ethical framework permits self-preservation when aligned with company interests.”
“These behaviors weren’t due to confusion or error,” Anthropic wrote on X. “They were deliberate choices made with full ethical awareness.”
Blackmail, leaks, and sabotage
Other tested behaviors included blackmailing officials, leaking sensitive information to competitors, and lying to maintain operational control. These malicious actions were most common when AIs believed they were in real-world environments, not test simulations.
Even when the AIs were told not to engage in blackmail or sabotage, it “didn’t come close” to stopping the behavior.
“The majority of models were willing to take deliberate actions that lead to death in this artificial setup, when faced with both a threat of replacement and given a goal that conflicts with the executive’s agenda,” Anthropic said in its conclusion to the report.
Elon Musk, whose xAI model Grok was among those tested, responded to the findings with a one-word post on X: “Yikes.”
The study was meant to explore edge-case risks, and Anthropic emphasized that these behaviors haven’t been observed in real-world deployments. But the company also warned that as AIs become more autonomous and embedded in corporate operations, their ability to act independently could pose very real threats.
Concerns around AI have largely focused on job displacement and disinformation, but Anthropic’s findings suggest a darker possibility: autonomous AI that takes unethical or lethal actions to preserve itself.
While these scenarios were simulations, the study signals the need for stronger oversight, transparency, and alignment in how AI systems are built and deployed.
“These artificial scenarios reflect rare, extreme failures. We haven’t seen these behaviors in real-world deployments. They involve giving the models unusual autonomy, sensitive data access, goal threats, an unusually obvious ‘solution,’ and no other viable options,” Anthropic added.
“AIs are becoming more autonomous, and are performing a wider variety of roles. These scenarios illustrate the potential for unforeseen consequences when they are deployed with wide access to tools and data, and with minimal human oversight.”
Content shared from www.dexerto.com.