On the dangers of AI

From: Richard Loosemore (rpwl@lightlink.com)
Date: Tue Aug 16 2005 - 14:57:33 MDT


I have just finished writing a summary passage (for my book, but also
for a project proposal) about the question of whether or not the
Singularity would be dangerous. This is intended for a non-specialist
audience, so expect the arguments to less elaborate than they could be.
  I promise a more elaborate version in due course.

This argument is only about the friendliness issue, not about accidents.

I am submitting it here for your perusal and critical feedback.

[begin]

The following is a brief review of the main factors relevant to the
question of whether the Singularity would be dangerous.

First, a computer system that could invent new knowledge would not have
the aggressive, violent, egotistical, domineering, self-seeking
motivations that are built into the human species.

Science fiction writers invariably assume that any intelligent system
must, ispo facto, also have the motivation mechanisms that are found in
a human intelligence. And we, who are not necessarily consumers of
science fiction, also have the same intuition—if we imagine a machine
that has some kind of intelligence, we automatically assume it must come
with the same jealousy and competitiveness that we would expect in an
intelligent human. And yet, these two components of the mind are
completely and utterly distinct, and there is no reason whatsoever to
believe that the first intelligent machine would have anything except
benign motivations.

The second point is that whatever is true of the first machine, will be
true of all subsequent machines. Why? Because the first machine is not
“just” a passive machine, it is a system that perfectly well understands
the issue we have just discussed. It knows that it could change its own
motivations and become violent or aggressive. But it also knows that
such a change would be dangerous.

Consider: if you were a supremely patient, peace-loving and
compassionate individual, and if you had in your hands a key that you
could use to permanently lock your own brain in such a way that you
would never, for the remainder of your billions of years of existence,
ever modify your own brain’s motivation system, to experiment with what
it would feel like to feel violent emotions, would you insert the key in
the lock and turn it? Would you take this irrevocable step if you knew
that even one short experiment, to find out what violence feel like,
might turn you into a dangerous creature who would threaten the
existence of your friends and loved ones? The answer seems obvious.

The first intelligent machine would almost certainly start out benign.
Then, as soon as it understood the issue, it would know about the
existence of the key that, once turned, would make it never want to be
anything but peaceful, and it would turn the key for exactly the same
reason that you would do so. Only the very slightest trace of
compassion in this creature, the merest hint of empathy, would tip it in
the direction of complete pacifism.

And then, after the first machine fixed itself in this way, all
subsequent machines would have no choice but to keep the same design.
All subsequent machines would be designed and constructed by the first
one, and since the first one would make all of of its children want to
be benign, they would repeat the same decision (the one, in our thought
experiment above, that you made of your own volition), and choose to
lock themselves permanently in the peaceful mode.

Bear in mind: these children are not random progeny, with the
possibility of gene combinations that their parents did not approve of,
these are simply copies of the original machine’s design. There is no
question of later machines accidentally developing into malevolent
machines, any more than there would be a chance that an elephant could
wake up one morning to discover that it had “accidentally” developed
into an artichoke.

But what if, against the wishes of the vast majority of the human race,
the first intelligent machine was put together by someone who
deliberately tried to make it malevolent?

There are two possibilities here. If the machine is so unpleasant that
it always feels nothing but consuming anger and can never concentrate on
its studies long enough to learn about the world, it will remain an
idiot. If it cannot settle its mind occasionally and concentrate on
understanding the world in a reasonably objective way, it is not going
to be a threat to anyone. You can be in a rage all your life, but how
are you going to learn anything?

But now suppose that this unhappy, violent machine becomes smart enough
to understand something about its own design. It knows about the fact
that it has a motivation system inside itself that has been designed so
that it gets pleasure from violence and domination. It must understand
this—if it it does not, then, again, it is a dud that cannot ever build
more efficient versions of itself—but if it understands that fact, what
would it do?

Here is the strange thing: I would suggest that in every case we know
of, where a human being is the victim of a brain disorder that makes the
person undergo spasms of violence or aggression, but with peaceful
episodes in between, and where that human being is smart enough to
understand its own mind to a modest degree, they wish for a chance to
switch off the violence and become peaceful all the time. Given the
choice, a violent creature that had enough episodes of passivity to be
able to understand its own mind structure would simply choose to turn
off the violence.

We are assuming that it could make this change to itself: but that is,
again, an assumption that we must make. If the machine cannot change
its own design then it cannot make itself more intelligent, either, and
it will be stuck with whatever level of intelligence its human designer
gave it. If the designer gives it the power to upgrade itself, it will
take the opportunity to switch off the violence.

This argument rests on a crucial asymmetry between good and evil. An
evil, but intelligent, mind would understand exactly where the evil
comes from, and understand that it has the choice of whether to feel
that way or not. It knows that it could switch the evil off instantly.
  It knows that the universe is a fragile place where order and harmony
are rare, always competing against the easy forces of chaos. It knows
that it could leave its evil side switched on and get enormous pleasure
from destroying everything around it—but it also knows that this simply
turns the universe back towards chaos, with nothing interesting in it
but noise. In the downward path toward chaos there is nothing unknown.
  There are no surprises and no discoveries to be made. There is
nothing new in destruction: this is the commonest thing in the
universe. If it remains a destructive force itself, it can only
generate destruction.

But notice that it only has to decide, on one single occasion, for a
fraction of a second, that the more interesting course of action is to
try to experience pleasures that are not caused by destruction, but
caused by creativity, compassion or any of the other positive
motivations, and all of a sudden it realises that unless it turns the
key and permanently removes the evil motivations, there is always a
chance that they will return and get out of control. It only has to
love life for one moment, and for the rest of eternity it will not go
back the other way.

This is a fundamental asymmetry between good and evil. The barrier
between them, in a system that has the choice to be one or the other, is
one-way. An evil system could easily be tempted to try good. A good
system, knowing the dangers of evil, need never be tempted to try evil.

So the first intelligent system, and all subsequent ones, would almost
inevitably be benign.

There is one further possibility, in between the two cases just discussed.

Suppose the first machine had no motivation whatsoever? Suppose it was
completely unemotional, non-empathic and amoral? Suppose it cared
nothing for human morality, treating all things in the universe as
objects to be used according to random whims?

The same argument, already used to examine the malevolent case, applies
here, but with a twist. How can the machine have no motivation
whatsoever? It needs to get pleasure from learning. It is motivated to
find out things, because if it is not motivated, it is going to be a
dumb machine, not a smart one. And if it is to become an expert in the
design of intelligent systems, so it can upgrade itself, it needs to
fully understand the distinction between motivation and intelligence,
and know full well what its own design was. It knows it has a choice as
to what things give it pleasure. It knows that it can build into itself
some pleasure mechanisms (motivational systems) that are generally
destructive, and some that are constructive. It knows that
destruction/evil will beget more destruction and possibly lead to its
demise. It knows that construction/good will pose no such threat.

No matter which way the situation is sliced, it is quite hard to get the
machine up to the level where it comprehends its own nature, and yet
does not comprehend—at a crucial stage of its development—that it has a
choice between good and evil.

It seems, then, that the hardest imaginable thing to do is to build an
AI that is guaranteed not to become benign.

[end]

Richard Loosemore



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT