What is stability in a FAI? (was Re: UCaRtMaAI paper)

From: Tim Freeman (tim@fungible.com)
Date: Sat Nov 24 2007 - 15:46:29 MST

Next message: Eliezer S. Yudkowsky: "Re: UCaRtMaAI paper"
Previous message: David Picón Álvarez: "Re: How to make a slave"
In reply to: Wei Dai: "Re: UCaRtMaAI paper"
Next in thread: Stefan Pernar: "Re: What is stability in a FAI? (was Re: UCaRtMaAI paper)"
Reply: Stefan Pernar: "Re: What is stability in a FAI? (was Re: UCaRtMaAI paper)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

From: "Wei Dai" <weidai@weidai.com>
>To take the simplest example, suppose I get a group of friends together and
>we all tell the AI, "at the end of this planning period please replace
>yourself with an AI that serves only us." The rest of humanity does not know
>about this, so they don't do anything that would let the AI infer that they
>would assign this outcome a low utility.

Good example. It points to the main flaw in the scheme -- I can't
prove it's stable, and a solution to the Friendly AI problem has to be
stable. Here "stability" roughly means that our Friendly AI isn't
going to construct an unfriendly AI and then allow the new one to take
over. However, if I look more closely, I don't know what "stable"
means.

Here's an example. I can't prove that real humans won't spontaneously
decide they want to destroy the universe next Tuesday. If the AI
stably does what people want, and the real humans decide they want to
destroy the universe next Tuesday, and the AI therefore irrevocably
starts the process of destroying the universe next Wednesday, was the
AI friendly and stable? If the answer is "yes", then we have to admit
that a stable friendly AI might destroy the universe, which seems bad.
If the answer is "no", then we seem to be demanding that a friendly AI
might not perform some action within its power that everyone wants,
which also seems bad. The fact that we can't prove any particular
property about real humans, and the algorithm tries to do what real
humans want, seems to make it impossible to prove that running the
algorithm will have any particular good consequences.

What does it mean to say that our Friendly AI has a stable
relationship with humans when we don't know that the humans are
stable?

Things might work out well for your particular example anyway. You
and I know that the rest of humanity won't appreciate some small junta
having total control after the current planning cycle. We know it
because we know that humans generally don't want to be dominated by
other humans. The AI can make that general observation too, since for
it each explanation of the world has one explanation of all human
nature, rather than independent explanations for each human. The one
explanation is the "Compute-Utility" oval at
http://www.fungible.com/respect/paper.html#beliefs, and it takes a
person id and a state-of-the-world-outside-everyones-mind as input.
So, after watching children playing in playgrounds and drug warfare in
the streets, it might understand human dominance well enough to guess
that ceding control to a small junta won't be desired by the vast
majority.

-- 
Tim Freeman               http://www.fungible.com           tim@fungible.com

Next message: Eliezer S. Yudkowsky: "Re: UCaRtMaAI paper"
Previous message: David Picón Álvarez: "Re: How to make a slave"
In reply to: Wei Dai: "Re: UCaRtMaAI paper"
Next in thread: Stefan Pernar: "Re: What is stability in a FAI? (was Re: UCaRtMaAI paper)"
Reply: Stefan Pernar: "Re: What is stability in a FAI? (was Re: UCaRtMaAI paper)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:01 MDT