Can AI be superhuman? Defects in the top player bot raise doubts

Research shows that even superintelligence in the go game is susceptible. Find out how AI systems like Katago fight against attacks and what effects this could have on the development of AI systems.

08. Juli 2024

Natur.wiki Autoren-Team

Artikel als PDF

Kommentare

Diesen Artikel teilen:

Facebook X Whatsapp Email

Talks about Excessive Artificial Intelligence (KI) increase. However, research results have uncovered weaknesses in one of the most successful AI systems-a bot that plays the board game Go and can beat the best human players in the world-which shows that such superiority can be fragile. The study raises questions whether more general AI systems could be susceptible to weak points that could endanger their security and reliability, and even their claim to be 'superhuman'.

"The paper leaves a big question mark about how the ambitious goal can be achieved to build robust AI agents of the real world that people can trust," says Huan Zhang, computer scientist at the University of Illinois Urbana-Champay. Stephen Casper, a computer scientist at the Massachusetts Institute of Technology in Cambridge, adds: "So far it has been providing some of the strongest evidence that it is difficult to implement advanced models as reliably as you want."

The analysis that in June 1 and has not yet been peer-reviewed uses the AI systems Entries Place that aim to tempt the systems to errors , be it for research purposes or for malicious purposes. For example, certain entries can 'jailbreaks' chatbots by issuing harmful information that you should normally suppress.

When the GO, two players alternately place black and white stones on a grid to surround and capture the other player's stones. In 2022, researchers reported about Defeated by katago

exploitation of Katago

was that a unique thing, or did this work point out a fundamental weakness in Katago-and, in expansion, to other AI systems with apparently superhuman skills? To investigate this, the researchers under the direction of Adam Gleave, Managing Director of Far Ai, a non-profit research organization in Berkeley, California and co-author of the Papers from 2022 2 , Adversarielle Bots to test three options, to defend GO-KIs against such attacks 1 .

The first defense was one that the Katago developers had already used after the 2022 attacks: Katago Examples of game situations that were involved in the attacks and let it play to learn how to play against these situations. This is similar to that as it generally taught the go. However, the authors of the latest papers found that an adversarial offer learned to beat this updated version of Katago and won 91 % of the time.

The second defense strategy that the Gleave team tried was iterative: to train a version of Katago against adversarielle bots, then to train attackers against the updated Katago and so on for nine rounds. But that did not lead to an invincible version of Katago either. The attackers continued to find weaknesses, with the last attack Katago defeated 81 % of the time.

As the third defense strategy, the researchers trained a new go-playing AI system from scratch. Katago is based on a calculation model known as the Convolutional Neural Network (CNN). The researchers suspected that CNNs could concentrate too much on local details and overlook global patterns. Therefore, they built a go player with an alternative Neural Network called Vision Transformer (Vit). But her adversarial bot found a new attack that helped him win 78 % of the time against the Vit system.

weak opponent

In all of these cases, the adversarial bots-although they were able to beat Katago and other leading go-playing systems-were trained to discover hidden weaknesses in other AIs, and not to be versatile strategists. "The opponents are still quite weak - we defeated them quite easily," says Gleave.

And since people are able to use the tactics of the adversarial bots to defeat leading go-KIs, does it make sense to call these systems superhuman? "This is a great question with which I definitely wrestled," says Gleave. "We have started to say" typically superhuman "." David Wu, a computer scientist in New York, who first developed Katago, says that strong go-kis are "on average superhuman", but not "in the worst cases".

Gleave says that the results could have far-reaching effects on AI systems, including the Large language models that are based on chatbots such as chatgpt . "The most important finding for AI is that these weak points will be difficult to eliminate," says Gleave. "If we cannot solve the problem in a simple area like GO, then there seems to be little prospect in the near future of solving similar problems like jailbreaks in chatt."

What the results mean for the possibility of creating a AI that exceeds human skills extensively is less clear, says Zhang. "Although this indicates superficially that people may still keep important cognitive advantages over AI," he says, "I think that the crucial knowledge is that We do not yet fully understand the AI systems that we build today . "

Tseng, T., McLean, E., Pelrine, K., Wang, T. T. & Gleave, A. Preprint at arxiv https://doi.org/10.48550/arxiv.2406.12843 (2024).
wang, T. T. et al. Preprint at arxiv https://doi.org/10.48550/arxiv.2211.00241 .

download sources