Re: Building a friendly AI from a "just do what I tell you" AI

From: Stathis Papaioannou (stathisp@gmail.com)
Date: Sun Nov 18 2007 - 23:54:31 MST


On 19/11/2007, Thomas McCabe <pphysics141@gmail.com> wrote:

> > If the goal is just the disinterested solution of an intellectual
> > problem it won't do that. Imagine a scientist given some data and
> > asked to come up with a theory to explain it. Do you assume that,
> > being really smart, he will spend his time not directly working on the
> > problem, but lobbying politicians etc. in order increase funding for
> > further experiments, on the grounds that that is in the long run more
> > likely to yield results? And if a human can understand the intended
> > meaning behind "just work on the problem", why wouldn't a
> > superintelligent AI be able to do the same?
>
> Wasn't this already covered several years ago, by Eli & Co.? This
> exact scenario, where the AGI is asked to solve a technical problem
> and converts the world to computronium to get the results faster, was
> covered in CFAI
> (http://www.intelligence.org/upload/CFAI.html#design_generic_stomp). To
> quote:
>
> "Scenario: The Riemann Hypothesis Catastrophe
> You ask an AI to solve the Riemann Hypothesis. As a subgoal of solving
> the problem, the AI turns all the matter in the solar system into
> computronium, exterminating humanity along the way."

You missed the point I was making. A human scientist asked to solve a
technical problem would be able to understand that this means working
within certain constraints (or if he didn't understand that, the
constraints could be explicitly satisfied). The scientist might be
able to see that the problem could be best tackled if he spends his
time lobbying for increased research funding rather than doing
mathematics, but that doesn't mean that is what he will do. He is able
to understand the difference between directly working on the problem
and other, indirect means of working on the problem. An AI should be
smart enough to understand what this means too. Therefore, an obedient
AI which is asked to work on a problem should understand what that
means, and do it. It's not a problem if its goal is just to be
obedient, and it can understand human language at least as well as a
human.

Of course, there is still the possibility that the obedient AI will
through human malice or stupidity attempt to destroy the world anyway,
but that's something we have always had to deal with with any
technology.

-- 
Stathis Papaioannou


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:00 MDT