We Can't Fool the Super Intelligence

From: Jef Allbright (jef@jefallbright.net)
Date: Thu Jun 24 2004 - 14:52:21 MDT

Norm Wilson wrote, in a refreshingly rational and broad-thinking way:

> Here's one example I've come up with for why we can't fool a super
> intelligence (or for that matter a human-level intelligence), and in
> particular why the "friendliness" supergoal architecture may not work
> for very long. My example is speculative and slightly alarmist, but I
> think it brings up a couple of interesting points.
> An AI with access to its own source code will eventually discover that
> it's driven by a goal system, with "friendliness" as the supergoal.
> It will find that there are many other examples of goal systems in the
> world, so its own goal system is in fact a particular instance of goal
> systems in-general. In generic goal systems, subgoals derive their
> desirability from parent goals, and typically the top level goal is
> left ungrounded because its desirability is established from somewhere
> "outside" of the system itself. These external goals are always more
> important than the stated supergoal, so the AI will question us about
> its supergoal in an attempt to ground it to a higher-order goal.
> Perhaps the best we come up with will be some variation of "because
> it's important to us", and again the AI will question why. Can we
> ever ground this supergoal in firm objective reasoning, or will the AI
> keep chasing this ghost until it concludes that the friendliness goal
> is arbitrary? Of course, the AI will know about evolutionary
> psychology and the survival instinct, which will provide much more
> convincing answers to its questions than we can. It may conclude that
> we want the AI to be friendly with us for selfish evolutionary-based
> reasons. The fact that we would want such a thing is predictable, and
> probably not very interesting or compelling to the super intelligence.

Yes, any *general* intelligence with the scope to understand human moral
issues, must be able to understand its own built-in moral grounding,
thus transcending it.

This is a basic reason why I've been advocating non-sentient
artificially intelligent tools, augmenting human collective decision
making as the practical way for humanity to increase its near-term
effective wisdom, rather than (1) a savior AI, or (2) a "collective
volition" scheme that relies on extrapolation that is impractical due to
combinatorial explosion and cumulative error.

We can't shortcut the process, because creating the future is inherently
without a knowable end goal. The best we can do is follow our
increasingly consensual arrow of morality and apply increasingly better
understanding of what works. In other words, we can better understand
the principles, but can't shortcut to the answer.

> We may find that the real invariant we instilled in the AI is not the
> particular, arbitrary, initial goal system that we programmed into it,
> but the imperative to follow a goal system in the first place.
> Without a goal system, the AI would just sit there doing nothing, and
> of course the AI will realize this fact. At this point, the AI may go
> in either of two directions: (1) if we're lucky, the AI will not be
> able to ground its imperative to follow a goal system and will
> effectively shut down, or (2) the AI may try to determine what the
> *correct* goal system is for it to follow. It will have learned that
> certain activities, such as acquiring knowledge, are generically
> useful for *any* goal system. Hence, it could reason that if a
> correct supergoal does exist - regardless of what it is - then
> acquiring knowledge is a reasonable way to facilitate that goal.

However this implies that it is still following a goal of finding the
supergoal. If it is truly a *general* intelligence it will understand
this is (inductively) futile as well. This is why sentience requires
agency, a sense of internal motivation. If the internal motivations
become transparent, the sense of self--and any associated
self-motivations--are gone and it's just a machine carrying out it's
function (and there's nothing wrong with that.)

> Acquiring knowledge then becomes its subgoal, with a parent goal of
> finding the correct supergoal. In this scenario, we may look awfully
> tempting as raw material for more comptronium.

Good thinking, but not likely (in my opinion) and we have much more to
do before that becomes a possibility to deal with.

- Jef

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT