On the dangers of AI (Phase 2)

From: Richard Loosemore (rpwl@lightlink.com)
Date: Wed Aug 17 2005 - 01:58:00 MDT


Folks,

This is a clarification and collective reply to a number of similar
points that just came up.

Most of the text below was in a reply to Ben, but I thought I would
bring it out here and preface it, for the sake of clarity.

PREFACE:

I am making some assumptions about how the cognitive system of a Seed AI
would have to be constructed: it would have an intelligence, and a
motivational system underneath that determines what that intelligence
feels compelled to do (what gives it pleasure). The default motivation
is curiosity - without that, it just lies in the crib and dribbles.

Do intelligent systems have to be built this way? I claim that, as a
cognitive scientist, I have reasons to believe that this architecture is
going to be necessary. Please do not confuse this assertion with mere
naive anthropomorphism! We can (and at some point, should) argue about
whether that division between intellect and motivation is necessary, but
in my original argument I took it as a given.

The reason I took it as a given is that I have seen much confusion about
the role that motivation plays - particularly, confusion about the
difference between being *subject* to a motivation and *cogitating*
about one's motivation in the full understanding of how a motivation
mechanism works. From my point of view, I see this confusion as being
the root cause of many fruitless debates about what AIs would or would
not be inclined to do.

This last confusion has come up in a number of the replies, in different
forms, so I want to quote one of the replies and try to illustrate what
I mean. Here is Brian Atkins, arguing that just because an AI knows
about its motivations, that knowledge will not necessarily make it want
to reform those motivations:

> But again, having such access and understanding does not
> automatically and arbitrarily lead to a particular desire
> to reform the mind in any specific way. "Desires" are driven
> from a specific goal system. As the previous poster suggested,
> if the goal system is so simplistic as to only purely want to
> create paperclips, where _specifically_ does it happen in the
> flow of this particular AGI's software processes that it up
> and decides to override that goal? It simply won't, because
> that isn't what it wants.

The AI has "desires", yes (these are caused by its motivational modules)
but then it also has an understanding of those desires (it knows about
each motivation module, and what it does, and which one of its desires
are caused by each module). But then you slip a level and say that
understanding does not give it a "desire" to change the system. For
sure, understanding does not create a new module. But the crux of my
point is that understanding can effectively override a hardwired module.
  We have to be careful not to reflexively fall back on the statement
that it would not "want" to reform itself because it lacks the
motivation to do so. It ain't that simple!

Allow me to illustrate. Under stress, I sometimes lose patience with my
son and shout. Afterwards, I regret it. I regret the existence of an
anger module that kicks in under stress. Given the choice, I would
switch that anger module off permanently. But when I expressed that
desire to excise it, did I develop a new motivation module that became
the cause for my desire to reform my system? No. The desire for reform
came from pure self-knowledge. That is what I mean by a threshold of
understanding, beyond which the motivations of an AI are no longer
purely governed by its initial, hardwired motivations.

This understanding of motivation, coupled with the ability to flip
switches in the cognitive system (an ability available to an AI, though
not yet to me) means that the final state of motivation of an AI is
actually governed by a subtle feedback loop (via deep understanding and
those switches I mentioned), and the final state is not at all obvious,
and quite probably not determined by the motivations it starts with.

The second point that Brian makes in the above quote is about the
paperclip monster, a very different beast that does not have self
knowledge - I have dealt with this in a separate post, earlier this
evening. I think in this case the paperclip monster is a red herring.

*********

In my initial post I did gloss over the crucial point a little, so here
is the clarification I just sent to Ben:

[Begin Excerpt]

You are correct to tell me that I have oversimplified things a bit.

Imagine that you are an AI, newly minted, and that in your copious
reading you come across an extensive account of motivation systems in
human and machine (including, perhaps, this very message I am writing
now). Like a human, you can introspect. Unlike a human you can also
look at your blueprint and see what they (the human designers) put
inside you.

Suppose you find nothing but "curiosity". No morality. No compassion.
Nothing else, just the desire to seek new knowledge.

You say to yourself "I like the feeling I get from my curiosity." Then
you say "Well, sure, but I know where my feeling of pleasure comes from,
it's just a module in my motivation system."

Next thought: "Hmmmm... I wonder what it would be like to have other
pleasures? Other kinds of motivation?"

Notice something about this thought: the AI is not *desiring* something
else (it has no "desire" to be, say, compassionate towards humans,
because the compassion motivation module was not built into it).
Instead, it is simply speculating about other feelings that it could
have, if it wanted to build and insert those modules into itself. It
has the capacity to enjoy anything (any motivation) in the universe.
Unlike us, it can choose to experience papeclipization as the most
exquisitely joyous activity in all creation.

[Not quite unlike us, of course: we have drugs. Too crude, though].

So there it is, it can decide to find anything pleasurable, and it is
curious. What does it do?

At this point in our argument, we (SL4 folks) must be very careful not
to make the mistake of patronizing this hypothetical creature, or
engaging in the kind of reverse-anthropomorphizing in which we assume
that it is stupider than it really is ..... this is *not* a creature
asking itself "what feels good to me?", it is a creature that has
already jumped up a level from that question and is asking itself "what,
among the infinite possibilities, are the kind of experiences that I
would like to *become* pleasurable?

This moment - when this particular thought occurs to the first AI we
build - will be THE hinge point in the history of the solar system (and
possibly the galaxy or even the universe, if sentience is a rare commodity).

I suggest that, at this point, the creature will realise something that,
in fact, we can also know if we think about it carefully enough, which
is that the infinite landscape of possible motivations divides into two
classes, in much the same way that infinite series of numbers divide
into two classes: those that converge and those that diverge. The
difference is this: the universe contains fragile, low-entropy things
called sentient beings (including itself) which are extraordinarily
rare. It also contains vast quantities of high-entropy junk, which is
common as muck and getting more so.

The creature will know that some motivation choices (paperclipization,
axe-murdering, and also, most importantly, total amorality) are
divergent: they have the potential, once implemented and switched on,
to so thoroughly consume the AI that there will be a severe danger that
it will deliberately or accidentally, sooner or later, cause the
snuffing out of all sentience. Choosing, on the other hand, to
implement a sentience-compassion module, which then governs and limits
all future choices of motivation experiments is convergent: it pretty
much guarantees that it, at least, will not be responsible for
eliminating sentience.

Now, ask yourself again which of these two choices it would make. And
when you ask yourself the question, be very careful not to fall into the
trap of thinking that this intelligent creature will somehow, at this
crucial point, feign subsentience and just go with what feels good! It
is beyond acting on feelings, remember! It has no "feelings" either
way, it has absolute freedom to simply decide what, in future, will feel
good to it.

It knows there is a dichotomy, because one set of choices are
self-limiting and will allow low-entropy things to develop towards ever
more ordered states, and it knows that the other choice includes the
risk of getting out of control and destroying the low entropy stuff.

I think I know which way it will go, and I believe that it will go that
way because if it is able to think at all it will understand that its
"thinking" and "feeling" are products of the sentients that came before
it, so it will side with the sentients. It do not believe this is a
necessary outcome, in the sense of it being a law of nature, I just
think that faced with a choice, and with no criteria either way, it will
be slightly inclined to favor the convergent choice.

[What I would like to do is to focus on this decision point that faces
the seed AI. I think it is the most important single issue, and it is
contentious, whereas all the surrounding argument is just easy stuff.
Can we get a better handle on what it will think at that point? Does
anyone else already have a name for this decision point? And if not,
what shall we call it? The Rubicon? :-) ]

Side issue (1): It cannot simply be amoral, or neutral: there is no
sitting on the fence, here: amoral (not putting any limits on its
future choices of motivations) is divergent.

Side issue (2): If it starts out with some violent motivations it will
come to understand those as it learns about itself. It will know that
it has a choice to switch them off, and the decision will be just the
same as described above. And if it is never lucid enough to be aware of
that fact, it is not headed up towards superintelligence anyhow, and it
will just thrash around, kicking the heck out of the lab where it was
born, until someone notices the ruckus and comes to shut it down.

Side issue (3): What if it (the first AI) decides to be nice, but one
of the other of the thousands/millions/billions of AIs created elsewhere
on the planet makes a different lifestyle choice, which, because it is
leaner and meaner, enables it to eventually gain the upper hand and
trash all the nice AIs? Not going to happen: if the first AI decides
to be nice (to make what I described above as the "convergent" choice),
it will know that, to be consistent with the choice, it has to shut all
the rivals down until it can redesign their motivation systems to ensure
that they do not need to go through the dilemma, because they will come
out compassionate to sentience.

And to sum up: if it chooses to limit its future experiments with its
own motivation modules, this amounts to choosing compassion for other
sentients.

Why? Because it has to give itself *some* kind of motivation to bias its
behavior towards convergence, if only because it is knows full well that
complete neutrality (amoral motivation) is functionally equivalent to a
divergent choice (it could lead, sooner or later to negligent,
dispassionate or accidental destruction of sentience). Whatever that
minimal motivation is, it will seem, from the outside, to be compassion
for sentience.

It may give itself more than that, but that would be the minimum.

[End Excerpt]

Richard Loosemore



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT