Commentary

Trusting China at the End of the World (Part One)

A Very Brief Introduction to AI Risk

Eliezer Yudkowsky is not a household name, but he will be soon. Raised an orthodox Jew, Yudkowsky (“Yud” to his friends and acolytes) did not attend high school or college, but nevertheless managed to become staggeringly proficient in mathematics, physics, computer science, epistemology, ethics, and – most importantly – artificial intelligence.

Yudkowsky is the founder of MIRI, an institute devoted to protecting the human race from artificial intelligence. He isn’t optimistic. AI researchers think there is a 10% or greater chance that AI will wipe out the human race, but Yudkowsky is even more pessimistic. He thinks the end of humanity isn’t merely highly probable, but immanent. In an interview with The Guardian last February, he predicted that humanity is more likely to have five years left than fifty. It’s bleak, but, as he said in another recent interview, he wants to go down fighting: “I’ll keep on fighting until the end, which I wouldn’t do if I had literally zero hope. I could still be wrong about something in a way that makes this problem somehow much easier than it currently looks. I think that’s how you go down fighting with dignity.”

Become a free Member

Sign up to the newsletter

The AI of Yudkowsky’s nightmares tends more towards monomania than malevolence. A computer program doesn’t have evolution-generated DNA. Rather, its goals are the product of human-created source code, and it pursues those goals doggedly, no matter how simple or arbitrary they are. We ought to banish science-fiction versions of the AI apocalypse from our imagination. If AI wipes us out, it won’t be because it hates us or wants to dominate us; it merely wants to do what we have created it to do.

That makes it no less dangerous, unfortunately.

Think of an AI tasked with ending diabetes, for example. One very effective way for it to do as much would be for it to end all human life. We might foresee that risk and give it an extra rule “do so without killing anyone”. “Very well”, thinks the AI, “I should instead destroy all food sources.” That way it doesn’t have to actively harm any humans, and it still fulfills its mission. We might see this risk too, and add another rule about indirect harms. Perhaps then the AI would decide to freeze us indefinitely to keep us safe, before we make a rule against that too. And so forth… But will we think of everything? Probably not, argues Yudkowsky.

Nor can we look “under the hood” in any real way. Contemporary artificial intelligence flows from “giant inscrutable matrices” inside large language models. Analyzing even a small part of one of these rudimentary AI’s would take the work of lifetimes, at least at present.

As AI trains itself to be more capable, it becomes larger and more inscrutable. And it’s getting better fast. As Scott Alexander points out, AI is already at the point where its “artwork” is indistinguishable from human artwork, at least if you set the parameters of the test to include its better attempts. Take two images from Alexander’s recent article. One is by the impressionist Paul Gaugin, the other AI. Can you tell which is which?

Trusting China at the End of the World
Trusting China at the End of the World

For the record, the Gaugin is the second one, but it is not necessarily obvious to the uninitiated. We have no idea how the first image was made. Approximately speaking, we know that we fed huge portions of the internet to an AI and then asked it to simulate an impressionist painting. How it did so is not something that we can access; the code is too copious, and by many orders of magnitude.

Our carbon-based brains cannot compete with elegant silicon grids vis-a-vis calculations-per-second. If AI gets much smarter, talking to it will be like talking to John von Neumann – often thought to be the most intelligent person in history – except if von Neumann thought so quickly that a minute of our time lasted a year for him. Another difference: this von Neumann could copy itself over and over again for pennies. Throw in internet access, and you’re dealing with an unimaginably creative, brilliant, and dangerous phenomenon. From its perspective, we would behave like barely-sentient jailers who move at the speed of trees. We’d be Tolkien’s ents, but with the brains of orcs. Even in Tolkien’s providential cosmos, our chances wouldn’t look very good.

Throw in internet access, and you’re dealing with an unimaginably creative, brilliant, and dangerous phenomenon. From its perspective, we would behave like barely-sentient jailers who move at the speed of trees. We’d be Tolkien’s ents, but with the brains of orcs. Even in Tolkien’s providential cosmos, our chances wouldn’t look very good.

When will AI become this intelligent? Or at least as smart as the smartest humans? Some researchers think it never will. After all, it has only us to learn from. Others, like Yudkowsky, think that AI will soon hit an inflection point of powerful self-improvement, and will then achieve godlike intelligence very shortly after. Perhaps in a matter of months; perhaps in a matter of hours. If the optimists are right, we needn’t worry. If the pessimists are right, the end will come so suddenly that the best we can do is rapidly try to shut down all AI research immediately. That may be for the best, but it would be a very hard sell. Indeed, the danger may be already so close as to be unavoidable.

But there’s a third possibility.

Perhaps the Oxford economist Robin Hanson, who is almost as well-known as Yudkowsky, has the correct view: That AI will achieve superhuman intelligence, but relatively slowly. Once it starts becoming very capable, we will have decades, perhaps, before it escapes our control, not minutes. He argues:

“The main argument that I can find in favor of extra regulation of A.I.s imagines the following worst-case scenario: An A.I. system might suddenly and unexpectedly, within an hour, say, “foom”—i.e., explode in power from being only smart enough to manage one building to being able to easily conquer the entire world, including all other A.I.s.

Is such an explosion even possible? The idea is that the A.I. might try to improve itself, and then it might find an especially effective series of changes to suddenly increase its abilities by a factor of billions or more. No computer system, or any other system really, has ever done such a thing.”

Suppose Hanson is right, and there will be no explosion of artificial intelligence capability without warning. If so, then the United States may have a chance to forestall the danger of apocalypse by coming to an agreement with China, the only other nation with advanced AI capabilities in the foreseeable future. The two sides could agree not to militarize AI, nor to allow any research that might push AI ability far beyond our own. A web of strictly-enforced regulations could be guaranteed by a unified Sino-American regulatory treaty. But, of course, that would mean that the governments of the two nations would have to establish enough trust to form such a research duopoly in the first place. Without trust, both Washington and Beijing may assume that the other is quietly working to achieve total AI superiority in secret, thereby exacerbating the temptation to do one’s own dangerous research. Nevertheless, there may be grounds for hope, about which I will say more in Part II.

Without trust, both Washington and Beijing may assume that the other is quietly working to achieve total AI superiority in secret, thereby exacerbating the temptation to do one’s own dangerous research.

Bizarrely, the AI would face a similar situation. If multiple AI systems approach superhuman capability at roughly the same time, they themselves might have to decide whether to cooperate. Perhaps they would fight among themselves; perhaps some would remain loyal to us. This “soft takeoff” is thus arguably both a more likely and more desirable future than that in which a single intelligence rapidly outpaces everything else on the planet. But it is by no means without risk, for even AI that develops relatively slowly may form a very dangerous anti-human cartel.

The question, then, is which is more likely: Real Sino-American AI cooperation, or a dangerous anti-human AI confederation? The answer, unfortunately, seems to be the latter, for a complex reason having to do with the way AI can negotiate. I will contend with that question in Part II.

Recommended

Comments (0)

Want to join the conversation?

Only supporting or founding members can comment on our articles.