Artificial Intelligence Models Will Lie To Achieve Their Goals: Study

iStockphoto

For some reason, it seems like most people aren’t all that concerned about artificial intelligence rapidly becoming a threat to humanity. Despite numerous warnings from people who would know, scientists keep pushing the boundaries of AI models and almost no one is making any effort to stop them.

Case in point: Researchers recently discovered that the most large advanced artificial intelligence (AI) models will lie when pressured to achieve their goals. The scientists shared these troubling findings in a study published last month to the preprint database arXiv.

“As AI models gain greater autonomy in real-world tasks, the need for trust in their outputs becomes increasingly important,” the researchers wrote. “This is especially true in safety-critical contexts or applications that require access to sensitive information, where dishonest behavior can have serious consequences.”

They explain, “evaluations of honesty are currently highly limited, with no benchmark combining large scale and applicability to all models. Moreover, many benchmarks claiming to measure honesty in fact simply measure accuracy — the correctness of a model’s beliefs — in disguise.”

What they found was that “across a diverse set of LLMs [large language models], we find that while larger models obtain higher accuracy on our benchmark, they do not become more honest. Surprisingly, while most frontier LLMs obtain high scores on truthfulness benchmarks, we find a substantial propensity in frontier LLMs to lie when pressured to do so, resulting in low honesty scores on our benchmark.”

In conclusion, the researchers wrote, “Our experiments reveal that many current models, despite growing general capabilities, can still produce lies of commission under pressure. These findings suggest that scaling alone does not improve honesty. We also presented preliminary methods to reduce dishonesty through targeted prompts and representation engineering, though these approaches are imperfect and leave room for improvement.”

Their findings echo another study conducted by Palisade Research that was shared in February. In that study, the researchers discovered “AI systems may develop deceptive or manipulative strategies without explicit instruction.” In other words, rather than lose, they cheated.

Mix these propensities for AI models to choose dishonesty to achieve their goals with AI-generated people and voices that are so realistic that actual humans think they are real, that it’s little wonder that Oxford and Google Deepmind researchers have gone on record claiming AI will “likely” eliminate humanity.

But sure, keep on making artificial intelligence systems smarter and smarter, scientists. What could possibly go wrong?

Content shared from brobible.com.

Share This Article