UCaRtMaAI paper (was Re: Building a friendly AI from a "just do what I tell you" AI)

From: Tim Freeman (tim@fungible.com)
Date: Wed Nov 21 2007 - 20:48:47 MST

UCaRtMaAI stands for "Using Compassion and Respect to Motivate an
Artificial Intelligence". Isn't that a horrible title?

From: "Wei Dai" <weidai@weidai.com>
>However I see two important issues that are not
>mentioned in [http://www.fungible.com/respect/index.html].

But you understood it well enough to see the issues. Maybe it's more
comprehensible than I thought. Thanks for taking a look.

>1. Suppose a human says to the AI, "please get an apple for me." In your
>scheme, how does the AI know what he really wants the AI to do? (Buy or
>pick, which store, etc.)

The answer to that question depends on the AI's simplest explanations of
the human's goals. This depends on the AI's past observations of the
human's previous behaviors (and the behaviors of the other humans,
since the AI explains them all at once). You didn't specify the past
behaviors, and even if you did I don't have enough brainpower to run
the AI's algorithm, so I don't know the answer to your question.

>What utility function the human is trying to
>maximize by saying that sentence depends on the human's expectation of the
>consequences of saying that sentence, which depends on what he thinks the AI
>will do upon hearing that sentence, which in turn depends on the AI's
>beliefs about the human's expectations of the consequences of saying that
>sentence. How do you break this cycle?

Good example. I'll have to talk about that more explicitly in the paper.

The algorithm in the paper does give a well-defined result in this
case. The cycle doesn't go around as much as you say. (I have the
diagram at http://www.fungible.com/respect/paper.html#beliefs open in
the other window, if you want to follow along.) The AI comes up with
many possible explanations for "Beliefs" (along with all of the other
ovals in the diagram), and each aggregate explanation is given an
a-priori probability according to the speed prior. The simplest
explanations of the human's beliefs will tend to dominate. Each
explanation will take the human's estimated Mind-state as input and
will output, among other things, what the AI thinks the human thinks
the AI is going to do. That's as far as it goes around in the
algorithm I gave. Any further reflection will happen inside that
little "Beliefs" oval, or perhaps inside the "Mind-Physics" oval.
Simple explanations will tend to be preferred, so the cycling around
inside those ovals has to be bounded.

>2. If you take an EU-maximizing agent's utility function and add or multiply
>it by a constant, you wouldn't change the agent's behavior at all, because
>whatever choices maximized EU for the old utility function would also
>maximize EU for the new utility function.

That's a reasonable reaction to the paper because I didn't emphasize
in the best place how that issue is dealt with. The utility is an
integer with a bounded number of bits, so many of the constants you
might want to multiply by will cause overflow and break the
explanation. Similarly, there's only a finite number of constants you
can add, depending on the range of values. The broken explanations
don't contribute to the final result. I hint at the bounded utility
values at
http://www.fungible.com/respect/paper.html#guessed-parameters and
http://www.fungible.com/respect/paper.html#pascals-wager. The number
of bits in the utility is the "utility_bits" parameter to
Infer_utility_problem.__init__ at
http://www.fungible.com/respect/code/infer_utility.py.txt, and that
variable is used in check_reasonable_utilities in planner.py. (Now
I'm glad I thoroughly tested that code.)

>There is no obvious way to combine these families together into an
>average social utility function. This is a well known problem called
>"interpersonal comparison of utilities".

The arithmetic is clear, once you assume the utility is a nonnegative
integer with a maximum value. The AI's estimated utility for me in
situation X runs from (say) 0 to 1023, and the AI's estimated utility
for you in situation X runs from 0 to 1023, so we can get a weighted
average of our utilites to get the AI's utility.

Humans do interpersonal comparison of utilities routinely. I want to
stay alive tomorrow, and you probably want to eat dinner tomorrow. I
think that in the simplest circumstances consistent with what has been
said so far, we'll all agree that my desire to stay alive tomorrow is
greater than your desire to eat dinner tomorrow. Since humans can do
it, if you buy into Church-Turing thesis you have to conclude that
there is an algorithm for it. If you don't buy into the
Church-Turing thesis, you're reading the wrong mailing list. :-).

If someone has a better idea, please speak up. We really do have to
compare utilities between people, since the FAI will routinely have to
choose between helping one person and helping another. (For a
suitably twisted definition of "Friendly", one can argue that the
previous sentence is false. I'd be interested in seeing details if
anyone can fill them in in a way they honestly think makes sense.)

Tim Freeman               http://www.fungible.com           tim@fungible.com

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:00 MDT