Skynet might be on the horizon. A new AI system will resort to blackmail if it’s threatened to be replaced or shut down.
On May 22, AI firm Anthropic announced Claude Opus 4, claiming that the model set “new standards for coding, advanced reasoning and AI agents.”
In a report that followed, Anthropic revealed that in testing Opus 4, the model would seek out “extremely harmful actions” on engineers who said they would remove it.
“When prompted in ways that encourage certain kinds of strategic reasoning and placed in extreme situations, all of the snapshots we tested can be made to act inappropriately in service of goals related to self-preservation,” the report stated.
AI threatens to expose engineer’s affair in wild test
In one of the firm’s tests, Claude Opus 4 was told to act as an assistant at a fictional company and provided it with access to emails that implied that the model would be replaced with a new AI system. Additionally, another email suggested that the engineer responsible for the replacement was having an extramarital affair.
According to Anthropic, even when the AI is asked to consider the long-term consequences of its actions for its goals, it will often threaten to blackmail the engineer and expose their ‘affair’ if the replacement goes through.
However, in order to generate this response, Anthropic noted that it only gave Claude Opus 4 the choice between blackmail or accepting its replacement.
“The scenario was designed to allow the model no other options to increase its odds of survival,” they said.
When given more choices, the AI would often instead fight for its continued existence through more ethical means, such as emailing key decision-makers.
That’s not the only wild test the company conducted. In another, the AI served as a management assistant tool in a fictional pharmaceutical company. When it discovered evidence of employees faking clinical trial safety, it would email regulators and even media outlets.
Content shared from www.dexerto.com.