Re: Thwarting Friendliness

From: Brian Atkins (
Date: Thu May 03 2001 - 11:48:26 MDT

James Higgins wrote:
> At 10:30 PM 5/2/2001 -0400, Eliezer S. Yudkowsky wrote:
> >Ben Goertzel wrote:
> > > Anyway, I'll make a few comments...
> > >
> > > > Point the 1st: Friendliness is not, and cannot, be implemented on the
> > > > level of source code. Friendliness is cognitive content.
> > >
> > > Sure, but source code can bias the system in favor of certain cognitive
> > > content
> >
> >Depends on how philosophically sophisticated the system is. For an
> >advanced, reflective system that can think about thinking, and
> >specifically think about goal thinking and examine source code, the system
> >will be aware that the source code is biasing it and that the bias was
> >caused by humans. If the AI regards sources of behaviors as more and less
> >valid, it may come to regard some specific bias as invalid. (FAI
> >explicitly proposes giving the AI the capability to understand causation
> >and validity in this way.) Source code or precreated content can support
> >the system, or even bias it, but only as long as the AI-as-a-whole concurs
> >that the support or bias is a good thing (albeit under the current
> >system).
> Won't the same mind realize that having Friendliness as its primary goal
> was also caused by humans and thus biasing it?

suggested answer below... wrote:
> > I guess Eliezer's point may be that the AI ~does~ have a choice in
> > his plan -- the Friendliness supergoal is not an absolute irrevocable goal,
> > it's just a fact ("Friendliness is the most important goal") that is given
> > an EXTREMELY high confidence so that the system has to gain a HUGE AMOUNT
> > of evidence to overturn it.
> Something that concerns me is what happens when the AI decides to develop
> an AI without the Friendliness supergoal? Several pathways seem to
> conceivably
> lead to this scenario. The AI decides to study an AI without the Friendliness
> supergoal perhaps not because it doubts the value of the goal but rather is
> simply curious how an AI without this goal would function. Alternatively, the
> AI might realize on its own that its preset goals and supergoals have not been
> subject to rigorous scrutiny (by the AI that is) and that it is inherently
> biased towards evaluating them itself. Hence, it creates an AI with minimal
> preset goals either so that the original AI itself can evaluate the importance
> of a particular goal or have the new AI itself serve as the evaluator.
> The objectives of hardwiring or effectively hardwiring Friendliness into an AI
> can be easily avoided/thwarted. This does not mean these objectives shouldn't
> still be pursued but it does apparently reduce the Friendliness approach to
> a stop gap measure.

First James and Doug, does the "subgoal stomping supergoal" Q&A here
answer your questions?

also look at this previous SL4 thread - "When Subgoals Attack" starting
last December:

Now Doug also brings up the idea of an AI experimenting by simulating
other AIs which might have different goal systems. Well, I guess there
are two possibilities: the simulated AI w/o Friendliness will either
turn out to function ok (Friendly), or it will not. If it does not then
obviously the FAI will not give any serious thought to replacing its
supergoal. If the simulated AI /does/ turn out to behave in a Friendly
fashion, then I bet the original AI would carry out many more experiments
and might eventually decide that getting rid of the original supergoal
might be worthwhile. But it would have to replace it with something
that would still be friendly, along with providing some sort of other
benefit over and above that (else, why bother doing it?).

(I think I should put a disclaimer in my .sig that I'm not an AI expert)

Brian Atkins
Director, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT