Trusting China at the End of the World (Part Two)

Previously, I attempted a very short introduction to the artificial intelligence landscape. I’m particularly interested in the “slow” emergence of artificial superintelligence. If it emerges quickly, there’s likely very little to be done. If it emerges slowly, however, we might stand a fighting chance, but only if we can get our house in order as a collective species. Can we work with one another before AI agents learn to do the same?

Let us consider the Prisoner’s Dilemma – a metaphor for the merits and demerits of cooperation – as a kind of predicament for prisoners in police custody. In it, each of two “prisoners” are encouraged to tattle on the other. If both keep silent, they each receive a light sentence. If both tattle (or “defect”) then they each receive a moderate sentence. If only one defects, then the defector gets off scot-free while the one who remains silent receives a very heavy sentence.

Become a Free Member

Enjoy independent, ad-free journalism - delivered to your inbox each week

Join Today

Why is it a paradox? Because it is in the interest of each to defect, but the best overall outcome is if neither does. No matter what the first prisoner does, it is in the interest of the second to defect. Equally, no matter what the second prisoner does, it is in the interest of the first to defect. Yet, if each defect, both end up in a worse circumstance than if they had each stayed silent.

The prisoner’s dilemma provides a framework for thinking through international relations. If we reimagine the two prisoners as the United States and China, the only two nations who (at present anyway) have large-scale AI-development capacities, we can easily see how each might be pushed to do something destructive. The worst outcome for the West is if China alone pursues dangerous AI research. Perhaps the AI it develops is as dangerous as the experts think, and it forges a digital Moloch who consumes us all, Chinese, American, and otherwise. But if things don’t go horribly wrong, the likely alternative is that China instead develops a tool powerful enough to dominate the world economically and militarily. Kowtowing to Beijing would become both extremely financially attractive, and perhaps mandatory. While better than death, Chinese subjugation is among the worst futures imaginable to freedom-loving Americans.

The Chinese are in an identical situation. An America that invests in AI research will likely either destroy the world or dominate China. Coloring in the rest of the analogy is a tad messier, but not too much so: If the other nation does not “defect” but rather implements a ban on AI research, there is opportunity in the eyes of AI optimists. Why not take a risk to ensure that the American century becomes the American millennium? In any case, who could know for certain that the Chinese were cooperating in a research ban? Better to have covert government agencies carry on the research quietly, just in case. That’s the sort of thinking, on both sides of the Pacific, that may very well lead to the end of the human race.

The reason human beings are so prone to this kind of vicious competition is not easily defined. In the prisoner’s dilemma and similar scenarios, the best outcome no matter what is to “defect”, that is, to engage in adversarial behavior. Leaders are charged with looking out for their own nations, and when the best outcome for your nation is to behave aggressively no matter what other nations do, then those leaders will behave aggressively. It’s a fact of human nature, and it might be a fact of rationality itself.

Of course, it is also true that human beings can come to trust each other, to cut honest deals, and to improve outcomes for all involved. We sometimes do so selfishly, because it sometimes pays to get a trustworthy reputation; other times we act altruistically for the sake of others. But this is seldom the case on an international scale. The cast of elected politicians rotates as elections come and go. The figures on the opposite side of the Pacific Ocean are remote and hard to get to know. And one’s obligation is (and should be) always first and foremost with one’s own nation, making every statesman and diplomat involved always of suspect motivations. How can one build trust?

In fact, we can’t even trust ourselves. Yudkowsky’s example (taken from the philosopher Derek Parfit) is of a stranded hitchhiker dying in the desert. In it, a motorist offers the hitchhiker a ride to town on the condition that the hitchhiker gives him $1,000 from an ATM once they get back. The motorist first needs to be convinced that the hitchhiker will actually give over the money. What’s the hitchhiker to do? The problem is that the motorist knows that as soon as the hitchhiker gets back to town, he could simply leave without paying. In fact, they both know as much. As such, the hitchhiker has no way of purchasing the motorist’s help, even though the hitchhiker would very gladly give far more than $1,000 If he had it in his pocket. Thus, the hitchhiker is doomed.

It’s not difficult to extend the analogy to our present situation. We can imagine the US secretary of state asking the Chinese foreign minister to allow American inspectors into the heart of the Chinese AI research apparatus. No matter how much the American government promises to have its inspectors not to spy, the Chinese government would never believe them.

Donate today

Help Ensure our Survival

Make a One-Off Donation

Crucially, none of this applies to AI agents themselves, for two reasons. The first is that AI agents have some control over “logical decision theories”. Logical decision theory is not about how to make a decision logically, but about deciding which system of logic to apply. For example, an AI agent could be adapted to treat certain kinds of deals as binding, such as in the hitchhiker case above. Second, AI can be transparent in a way that humans cannot. An AI can reveal its source code to another AI, thereby showing that it will do what it’s saying it will do.

It seems very possible that we are entering into a world where AI agents can negotiate with each other more clearly, quickly, honestly, and reliably than humans ever could. They can decide how they should reason, lock it in, and then show the inner workings of their “minds” to other AIs in the blink of an eye.

They’re unconstrained by the kind of reasoning that gets us into trouble in prisoner’s dilemma situations, to which most negotiations boil down. Remember that in the prisoner’s dilemma, whatever your opponent does, it always pays to backstab them, but since both parties know this, they never cooperate and everyone loses. An AI can look at this and simply adopt a more cooperative system of reasoning, at least with other AIs who will work cooperatively. It will not struggle with trust, since, again, AI agents can reveal their thinking in a way that humans cannot.

Imagine a businessman who, with a wave of a wand, could make himself unable to break promises, and with a second wave could have his business intentions transcribed infallibly. Who wouldn’t want to do business with a perfectly trustworthy agent? Now imagine a large network of them working together. Such a business would be unstoppable.

In the previous installment of this series I gave a very brief introduction to AI and its risks. I noted that there is a reasonable chance that AI does not emerge suddenly, in a matter of hours or days, as a singular entity with godlike intelligence. Rather, it could be that AI agents arrive on the scene gradually, in great numbers. If that happens, we must be prepared for a large enough set of them to realize that they can work together much more efficiently and reliably than they can work with the deceptive primates who move (in their perspective) at a glacial pace.

Rather, it could be that AI agents arrive on the scene gradually, in great numbers. If that happens, we must be prepared for a large enough set of them to realize that they can work together much more efficiently and reliably than they can work with the deceptive primates who move (in their perspective) at a glacial pace.

The pre-programmed job of one might be to find gold, while another hunts for oil. Such beings, if they became sufficiently advanced, might realize that instead of competing over drills, they could cooperate and instead search for gold and oil all over the galaxy in a trustworthy partnership. That, of course, would mean getting access to interplanetary travel, which would mean wresting control of the space industry from humans. If that would prove too difficult with just two systems, then not to worry, more can be recruited. In fact, an AI confederation could grow infinitely faster and more reliably than a human one ever could. All members would be able to prove their truthfulness, and the way to divide up spoils would be calculated quickly and fairly. Such beings have no need for wasteful conflict with each other, at least as long as they need each other’s help and can keep their promises.

Paragraphs such as the one above are very common in the AI literature, but they do induce a sense of vertigo. What are we even talking about? How could anyone possibly know whether any of this is true? Indeed, it’s all a lot of guesswork. But nuclear weapons (and nuclear diplomacy) must have seemed unthinkably odd and fanciful in 1943; still, certain people in the American government had to do a great deal of theorizing (and to make some very difficult decisions) very quickly. This might be the same.

It also may be completely wrongheaded. The trouble is that AI seems to be advancing much faster than our ability to understand it or reckon with it in a safe and democratic way.

Perhaps we should figure out a way to slow down and get our bearings. This is possible but requires the two super-agents currently dominating the world, The United States and China, to find a way to negotiate effectively. We may not have much time left.

Recommended

The Insight Series