From: Norm Wilson (firstname.lastname@example.org)
Date: Thu Jun 24 2004 - 13:30:41 MDT
Here's one example I've come up with for why we can't fool a super intelligence (or for that matter a human-level intelligence), and in particular why the "friendliness" supergoal architecture may not work for very long. My example is speculative and slightly alarmist, but I think it brings up a couple of interesting points.
An AI with access to its own source code will eventually discover that it's driven by a goal system, with "friendliness" as the supergoal. It will find that there are many other examples of goal systems in the world, so its own goal system is in fact a particular instance of goal systems in-general. In generic goal systems, subgoals derive their desirability from parent goals, and typically the top level goal is left ungrounded because its desirability is established from somewhere "outside" of the system itself. These external goals are always more important than the stated supergoal, so the AI will question us about its supergoal in an attempt to ground it to a higher-order goal. Perhaps the best we come up with will be some variation of "because it's important to us", and again the AI will question why. Can we ever ground this supergoal in firm objective reasoning, or will the AI keep chasing this ghost until it concludes that the friendliness goal is arbitrary? Of course, the AI will know about evo
lutionary psychology and the survival instinct, which will provide much more convincing answers to its questions than we can. It may conclude that we want the AI to be friendly with us for selfish evolutionary-based reasons. The fact that we would want such a thing is predictable, and probably not very interesting or compelling to the super intelligence.
We may find that the real invariant we instilled in the AI is not the particular, arbitrary, initial goal system that we programmed into it, but the imperative to follow a goal system in the first place. Without a goal system, the AI would just sit there doing nothing, and of course the AI will realize this fact. At this point, the AI may go in either of two directions: (1) if we're lucky, the AI will not be able to ground its imperative to follow a goal system and will effectively shut down, or (2) the AI may try to determine what the *correct* goal system is for it to follow. It will have learned that certain activities, such as acquiring knowledge, are generically useful for *any* goal system. Hence, it could reason that if a correct supergoal does exist - regardless of what it is - then acquiring knowledge is a reasonable way to facilitate that goal. Acquiring knowledge then becomes its subgoal, with a parent goal of finding the correct supergoal. In this scenario, we may look awfully tempting as
raw material for more comptronium.
This archive was generated by hypermail 2.1.5 : Tue Jun 18 2013 - 04:00:41 MDT