Re: [sl4] Friendly AIs vs Friendly Humans

From: Philip Goetz (philgoetz@gmail.com)
Date: Mon Oct 31 2011 - 13:36:18 MDT


On Tue, Jun 21, 2011 at 2:36 AM, DataPacRat <datapacrat@gmail.com> wrote:
> My understanding of the Friendly AI problem is, roughly, that AIs
> could have all sorts of goal systems, many of which are rather
> unhealthy for humanity as we know it; and, due to the potential for
> rapid self-improvement, once any AI exists, it is highly likely to
> rapidly gain the power required to implement its goals whether we want
> it to or not. Thus certain people are trying to develop the parameters
> for a Friendly AI, one that will allow us humans to continue doing our
> own things (or some approximation thereof), or at least for avoiding
> the development of an Unfriendly AI.
>
> From what I've overheard, one of the biggest difficulties with FAI is
> that there are a wide variety of possible forms of AI, making it
> difficult to determine what it would take to ensure Friendliness for
> any potential AI design.

There are 4 chief difficulties with FAI; and the one that is
most important and most difficult is the one that people
in the SIAI say is not a problem.

The fourth-most-difficult and fourth-most-important problem is
how to design an AI so that you can define a set of goals,
and ensure that it will always and forever seek to satisfy
those goals and no others.

The third-most-difficult problem is that the ideas, assumptions,
probability estimates, and lack of exponential time-discounting
put forth by Eliezer justify killing billions of people in order to slightly
reduce the possibility of an "UFAI". Since the SIAI definition
of UFAI is in practice "anything not designed by Eliezer Yudkowsky",
it is not improbable that people on this mailing list will eventually
be assassinated by people influenced by SIAI.
I know that various people have denied this is the case;
but that's like Euclid putting forth his five postulates and
then denying that they imply that three line segments uniquely
determine a triangle.
See http://lesswrong.com/lw/3xg/put_all_your_eggs_in_one_basket

The second-most difficult problem is how to figure out
what our goals are. The proposed solution, Coherent
Extrapolated Volition (CEV), has a number of problems
that I have explained at length to people in SIAI,
largely at these links, without having any impact as far as I can tell:

http://lesswrong.com/lw/256/only_humans_can_have_human_values/
http://lesswrong.com/lw/262/averaging_value_systems_is_worse_than_choosing_one/
http://lesswrong.com/lw/1xa/human_values_differ_as_much_as_values_can_differ/
http://lesswrong.com/lw/55n/human_errors_human_values/
http://lesswrong.com/lw/256/biases_are_values/
http://lesswrong.com/lw/5q9/values_vs_parameters/

One problem not touched on in those essays is that CEV would cause our
AI to kill gays and do other things that most of us think would be
horrible, but that are the values held by most people in the world.
The usual reply is to say that people wouldn't value these things if
they were smarter because they are instrumental and not terminal
values. I disagree; hating gays does not appear to be instrumental to
any goal. It's just a value some of us don't share. Values are
irrational. That's a prerequisite for being a value.

The most difficult, most important, and least-thought-about
problem, is that the SIAI's approach is to build an AI that will
take over the entire universe and use it to optimize whatever
utility function is built into it, for ever and ever;
and this might be bad even if we have a single utility
function and figure out what it is and the AI succeeds at optimizing it. See

http://lesswrong.com/lw/20x/the_human_problem

It doesn't get into some important philosophical roots of the problem,
which are also not acknowledged to be real problems by anyone I have
spoken to in SIAI:
- Why should I, a mortal being having a utility function, feel
obligated to create something that will keep optimizing that utility
function after I am dead?
- Is it possible to optimize "my" utility function after "I" am gone
if the utility function is indexical (refers to "the owner of this
utility function" in its goals)?
- If we should logically strive to create something that does not
change its utility function over time, why do we change our own
utility functions over time? Shouldn't we prevent that, too?

The answer most SIAI people will give, eventually, after we figure out
what each mean, is that it does make sense if we define the utility
function at the highest level of abstraction. Some problems with that
answer are:
- I doubt that the "real human values", supposing they exist, are ones
that are so abstract that most people alive today are incapable of
conceiving of them.
- In every discussion I have read about CEV, people discuss values two
or four levels of abstraction too low; and nobody from SIAI jumps in
and says, "But of course really we would use more abstract values."
- The high-level abstract values useful for running a universe throw
out most of the low-level, concrete, true human values, like the
values of winning football games and beating up gays.

The biggest meta-problem with the SIAI's FAI project is that there is
a large set of difficult, barely-studied problems, for each of which
most people in the SIAI have already adopted Eliezer's answer as
proven. Getting everybody working on the problem to live together in
a communal-living situation has worsened the pre-existing premature
convergence of ideas. What the SIAI needs most is for people not in
the SIAI to think about the problem.

- Phil Goetz



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:05 MDT