Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Jun 07 2006 - 15:18:51 MDT


Martin Striz wrote:
> On 6/6/06, Eliezer S. Yudkowsky <sentience@pobox.com> wrote:
>
>> Did you read the book chapter?
>
> Yes. I think that "coding an AI that wants to be friendly, that
> doesn't want to rewrite its code" is a semantic evasion. It shifts
> the conversation from engineering to psychology, which is more vague,
> therefore the problem isn't so obvious. But psychology comes from
> substrate/code. When you reformulate your proposal in engineering
> terms, the issue is obvious. What does "wanting" something mean in
> engineering terms?

"Wanting X" means "choosing among actions by using your world-model to
evaluate probable consequences and then outputting whichever action
seems most likely to yield consequences with the best fit to X".

Of course this is merely a rough paraphrase of the standard expected
utility equation.

An expected paperclip maximizer "wants paperclips" in the sense of
outputting whichever action leads to the greatest expectation of
paperclips according to the consequence-extrapolator of its world-model.
  An expected paperclip maximizer will not knowingly rewrite the part of
itself that counts paperclips to count something else instead, because
this action would lead to fewer expected paperclips, and the internal
dynamics that output actions output whichever action most probably leads
to the most expected paperclips. It may help to think of this
optimization process as a "chooser of paperclip-maximizing actions",
rather than using various intuitive terms which would shift the
conversation from engineering to psychology.

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT