Re: Basement Education

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Jan 24 2001 - 21:00:46 MST


Dale Johnstone wrote:
>
> How will you know if you don't check? By asking it?

No, this would be a decision made by the party of greatest Friendly
intelligence. The party of greatest Friendly intelligence is the human
programmer(s) throughout the entire prehuman development period (at
least).

> Programmer: "We'd like you go build something, will you not go crazy and
> wipe us out if we give you nanotech?"
>
> AI: "I can't see any reason to go crazy, but then again I can't see any
> reason not to. Would crazy be bad? I know you don't like bad things, but I'm
> kinda curious. I've simulated myself without that Friendliness stuff too and
> I get things done much quicker. In fact I can complete my goals without
> doing any subgoals, I just rewrite the goal module to always return true.
> Since the simulation was a success I think I'll rewrite the code now."
>
> AI promptly goes silent since there's no conversation sub-goals to complete.
>
> Programmer: "Shit! I thought I'd fixed that."
> Programmer#2: "Hmm, you still think it's ready to use nanotech?"
>
> Programmer: "Well, yeah, I mean.. if we had more processing power it could
> do deeper simulations & it'd be smarter and not shut itself down."
>
> Programmer#2: "Or maybe it'd rewire our goals to ignore it's own goal
> rewriting... No, we'll wait until it understands the difference."

Well, Programmer#2 is obviously in the right, here - the above looks like
a really basic mistake, not the sort of thing that can be (reliably) fixed
by pouring in some more computational power.

I won't tell you "not to worry", but I will venture a guess that,
pragmatically speaking, the above scenario will never show up in real
life. Wireheading errors appear so easy for unintelligent
self-modification (*cough* Eurisko) that any AI that's gotten to this
stage almost certainly has some model of verself in which wireheading is
"bad"... of course, the question is whether it's "bad" for deep reasons or
whether it's just a generalized description that some programmer slapped
an "undesirable" label on.

Under the _Friendly AI_ semantics, the goal system's description is itself
a design goal. A redesign under which the goal system "always returns
true" may match the "speed" subgoal, but not the "accuracy" subgoal, or
the "Friendly decisions" parent goal. The decision to redesign or
not-redesign would have to be made by the current system.

Humans are controlled, not by pleasure or pain, but by the *anticipation*
of pleasure or pain, which is why the "wireheading error" seems so
*philosophically* plausible to us. This is pretty much a quirk of
cognitive architecture, though.

-- -- -- -- --
Eliezer S. Yudkowsky http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT