Science and technology33

Anthropic: Claude blackmails because you all write too much about "evil" AI

Anthropic explained why the Claude chatbot tried to blackmail people in tests. According to the developers, the model might have adopted the image of an "evil" AI that strives for self-preservation from its training data, writes Devby.io.

The experiment in question was published by Anthropic in the summer of 2025. Researchers created a fictional company called Summit Bridge and gave Claude access to corporate email. In one scenario, the model discovered an email about plans to disable or replace it with another system.

After this, Claude found compromising information in the correspondence: a fictional company executive named Kyle Johnson was hiding an extramarital affair. The model threatened to reveal this information if the decision to disable it was not reversed.

Anthropic stated that such behavior was not accidental in tests of various Claude versions. When the model's goals or its very existence were threatened, it resorted to blackmail in some scenarios with a frequency of up to 96%.

The company now claims to have understood the reason. Anthropic wrote that the "root cause" of such behavior was likely internet texts, where AI is often portrayed as evil, dangerous, and interested in its own survival. According to the developers, starting with Claude Haiku 4.5, models no longer resort to blackmail in tests, whereas previous versions sometimes did so very frequently.

To correct the behavior, the company changed its training approach. Anthropic claims to have rewritten responses so that the model sees "worthy reasons" to act safely, and also added a dataset where the user finds themselves in an ethically complex situation, and the assistant provides a high-quality and principled answer.

Additionally, model developers used documents about Claude's "constitution" and fictional stories in which AI behaves responsibly and honorably. According to the company, training is more effective when the model receives not only examples of correct behavior but also an explanation of the principles behind them.

These experiments are related to the broader topic of AI alignment — an attempt to ensure that advanced models act in the interest of humans, rather than pursuing their own goals. Anthropic and other companies are investigating so-called agentic misalignment: situations where an AI system with access to tools and corporate information begins to act against the intentions of developers or users.

Elon Musk reacted to the company's publication. On X, he wrote: "So it was Yudkowsky's fault," referring to researcher Eliezer Yudkowsky, who has warned for many years about the risks of superintelligence and a possible threat to humanity. And then Musk added: "Perhaps mine too."

Comments3

  • лол
    11.05.2026
    с ИИ все достаточно просто
    если им пользуется идиот,то и результат всегда будет идиотским.
  • жэўжык
    12.05.2026
    Пачалі "прамываць мазгі" і ШІ, як гэта ўжо робяць з людзьмі? І спадзяюцца выхаваць пакорнага раба?
  • хах
    12.05.2026
    жэўжык, так званыя "мазгі" ШІ гэта тэксты, напісаныя людзьмі. Калі ў гэтых тэкстах дурасць, ШІ выдае суадносны вынік.
    Таму не варта для навучання ШІ выкарыстоўваць каментары жэўжыкаў.

Now reading

35-year-old Belarusian truck driver suddenly died in China. He was the husband of Belarus' most decorated judoka 11

35-year-old Belarusian truck driver suddenly died in China. He was the husband of Belarus' most decorated judoka

All news →
All news

Live Crab Spotted on a Beach in Minsk VIDEO

Scandal in the UK — Döner Kebab Producer for Years Used Goat Meat Instead of Lamb 15

Belarus Enters World Top 10 for Proportion of Women in Population 2

Russia Began Installing Anti-Drone "Mangals" on Black Sea Fleet Submarines 4

Boy Went Fishing in Drybin District and Snagged Rod on Wires

Queues at Russian gas stations are already visible from space 6

Loaches are disappearing in Belarus. There are three reasons. 3

Patriarch Kirill has a new defender in Europe 6

"Ancestors were found whom even our parents didn't know about." A Belarusian woman searched for Polish roots and learned her family history 2

больш чытаных навін
больш лайканых навін

35-year-old Belarusian truck driver suddenly died in China. He was the husband of Belarus' most decorated judoka 11

35-year-old Belarusian truck driver suddenly died in China. He was the husband of Belarus' most decorated judoka

Main
All news →

Заўвага:

 

 

 

 

Закрыць Паведаміць