# [sl4] Bayesian rationality vs. voluntary mergers

From: Wei Dai (weidai@weidai.com)
Date: Sun Sep 07 2008 - 16:36:48 MDT

After suggesting in a previous post [1] that AIs who want to cooperate with
each other may find it more efficient to merge than to trade, I realized
that voluntary mergers do not necessarily preserve Bayesian rationality,
that is, rationality as defined by standard decision theory. In other words,
two "rational" AIs may find themselves in a situation where they won't
voluntarily merge into a "rational" AI, but can agree merge into an
"irrational" one. This seems to suggest that we shouldn't expect AIs to be
constrained by Bayesian rationality, and that we need an expanded definition
of what rationality is.

Let me give a couple of examples to illustrate my point. First consider an
AI with the only goal of turning the universe into paperclips, and another
one with the goal of turning the universe into staples. Each AI is
programmed to get 1 util if at least 60% of the accessible universe is
converted into its target item, and 0 utils otherwise. Clearly they can't
both reach their goals (assuming their definitions of "accessible universe"
overlap sufficiently), but they are not playing a zero-sum game, since it is
possible for them to both lose, if for example they start a destructive war
that devastates both of them, or if they just each convert 50% of the
universe.

So what should they do? In [1] I suggested that two AIs can create a third
AI whose utility function is a linear combination of the utilities of the
original AIs, and then hand off their assets to the new AI. But that doesn't
work in this case. If they tried this, the new AI will get 1 util if at
least 60% of the universe is converted to paperclips, and 1 util if at least
60% of the universe is converted to staples. In order to maximize its
expected utility, it will pursue the one goal with the highest chance of
success (even if it's just slightly higher than the other goal). But if
these success probabilities were known before the merger, the AI whose goal
has a smaller chance of success would have refused to agree to the merger.
That AI should only agree if the merger allows it to have a close to 50%
probability of success according to its original utility function.

The problem here is that standard decision theory does not allow a
probabilistic mixture of outcomes to have a higher utility than the
mixture's expected utility, so a 50/50 chance of reaching either of two
goals A and B cannot have a higher utility than 100% chance of reaching A
and a higher utility than 100% chance of reaching B, but that is what is
needed in this case in order for both AIs to agree to the merger.

The second example shows how a difference in the priors of two AIs, as
opposed to their utility functions, can have a similar effect. Suppose two
AIs come upon an alien artifact which looks like a safe with a combination
lock. There is a plaque that says they can try to open the lock the next
day, but it will cost \$1 to try each combination. Each AI values the
contents of the safe at 3 utils, and the best alternative use of the \$1 at 2
utils. They also each think they have a good guess of the lock combination,
assigning a 90% probability of being correct, but their guesses are
different due to having different priors. They have until tomorrow to decide
whether to try their guesses or not, but in the mean time they have to
decide whether or not to merge. If they don't merge, they will each try a
guess and expect to get .9*3=2.7 utils, but if they do merge into a new
Bayesian AI with an average of their priors, the new AI will assign .45
probability of each guess being correct, and since the expected utility of
trying a guess is now .45 * 3 < 2, it will decide not to try either
combination. The original AIs, knowing this, would refuse to merge.

To generalize a bit from these examples, it appears that standard decision
theory was created to model or guide the decision making processes of
individuals, whereas AIs may need to represent the beliefs and interests of
groups. This may be the result of mergers as in these examples, or the AIs
may be designed from the start to benefit whole groups or societies. The
standard Bayesian notion of rationality does not seem adequate for this