RE: guaranteeing friendliness

From: Herb Martin (HerbM@LearnQuick.Com)
Date: Wed Nov 30 2005 - 15:04:44 MST


> -----Original Message-----
> From: Christian Rovner
> To: sl4@sl4.org
>
> Richard Loosemore wrote:
> >
> > I repeat: why is extreme smartness capable of extreme persuasion?
>
> Persuasion is a special case of reality-optimization (aka
> goal-achieving).
>
> If you are asking why extreme smartness is capable of
> achieving goals,
> then I really don't know--otherwise I would be programming
> AI. Of couse
> this is not obvious at all.

Persuasion is a teachable skill (to human level intelligences.)

Much of the information for teaching and learning this skill set
is documented in books and online.

> If you are asking why extreme smartness is capable of achieving this
> kind of goal in particular, I'll ask in return: Why not? Is there
> something special about a human mind that makes it unpredictable, no
> matter how detailed and accurate a (causal) model we use?

Persuasion is a "statistical skill" that is best applied to
individuals by using feedback and changes in strategy based
on reaction to previous tactics.

But make no mistake, persuasion is at least as teachable as
an athletic sport skill set -- this analogy is picked
because if you teach boxing or basketball no technique will
be effective against an arbitrary opponent 100% of the time,
but improvements are real and measurable. The same is
true for persuasion.

Even persuasion in the mass media is usually applied (today)
by measuring feedback and adjusting the message where feasible.

It's cheaper for advertisers to target their persuasive messages
this way than to blindly blast the same (less than optimal)
message, unless a threshold success rate is reached without need
for adjustment. Those who don't reach such thresholds go out
of business in most cases.

The most surprising persuasion technique [to me] is that of
"giving a reason" since it apparently does require the reason
make much sense or even have any substance.

The classic experiment was to request to be allowed to go "jump
the line" at a copier using, (separately) "no reason", "because
[good reason goes here]", and "because I am in a hurry" or
"because its important that I go first" type of 'reasons'.

'Reasons' worked MUCH better. 'Nothing reasons' were just about
as effective as giving real information.

Someone mentioned hypnosis (earlier in this thread I believe),
and this is also a teachable skill (again to human level
intelligences.)

[And lest anyone think that hypnosis is fantasy, it's
effectiveness has been documented scientifically for
such uses as controlling bleeding -- a quite repeatable
and testable result.]

Hypnosis CAN be used to obtain behavior against the interests
or morals of the subject but doing so it VERY difficult and
generally requires misrepresenting the situation rather than
directly ordering counter-interest behavior (if one expects
reasonably reliable results.)

We cannot even guarantee our children will grow up to be
responsible human beings.

We can follow generally accepted guidelines, and teach
our children moral or ethical behavior, but we cannot
guarantee that behavior completely; we only know that
generally such parenting leads to children who become
good human beings more often than to the opposite result.

It is not likely that a "programmer" could even review
enough of a (truly) human level intelligence to understand
where things go wrong.

It's not possible to create bug free software; imagine
trying to just FIND the bugs in a large program like Microsoft
Word, or even Windows itself (and this is true for Linux too
but notice that Linux has the advantage of Open Source which
is precisely what you CANNOT do if you must guarantee
friendliness which includes guaranteeing that no one modifies
the code in unfriendly ways.)

When you couple this with the likelihood that human level
intelligence will likely have neural nets (or similar nets)
and genetic algorithms learned through training and
adaptation and having no direct high level language
representation then it is unlikely that the programmer
can either read the source code OR even review ALL of it.

Current computer programs run on hardware with approximately
10^9 memory locations (4 x 10^9 is the current limit for most
PCs, but most don't have all the memory that is possible nor
can the programs use that much.) The operating systems
alone use around one tenth (10^8) of that and it is unlikely
that any one programmer could review just that.

Current estimates expect that human level intelligence will
require IN EXCESS of 10^15 memory locations -- about ten
million times (10^7) more ( than current operating systems.

Other estimates suggest it might take a thousand or more
time as much for such intelligence levels so it is unlikely
that anyone could ever review such a large body of code once
it is made self-improving.

It is practically impossible to "guarantee friendly behavior"
OVER TIME -- to the extent that we are successful, our guard
will tend to drop.

Human beings are lazy -- taking security precautions against
imaginary threats is seldom maintained. (Part of the reason
our current security precautions against terrorists are
doomed to failure if we don't remove the terrorists through
offensive and strategic actions rather than purely defensive
methods.)

No rules will be 100% safe if the program learns and adapts.

Human beings probably cannot even agree on what is friendly
behavior -- to the religious fanatic killing you to save the
world or praise some deity may even constitute "friendly
behavior" from this world view. Total non-interference to
the point of allowing suicide and other self-destructive
behavior is likely acceptable to (most) Libertarians.

The point here is (of course) NOT the particular beliefs used
as examples but the fact that different programmers could
not even agree on correct "friendly behavior".

Doctors don't always agree on the behaviors that comply with
the Hippocratic Oath, and that one is quite straightforward
as human creeds go.

And how many people would include allowing doctors to assist
death if it relieves greater suffering, while others would
read "Do no harm" literally and insist euthanasia is ALWAYS
wrong.

Pick any serious moral or ethical belief and you will likely
find someone who would disagree under some particular set
of circumstances.

Killing is not always murder (e.g., defense of a child or other
defenseless person from the criminally violent).

But, notice that a Quaker might disagree with the above
sentence -- and do so honestly and consistently.

Allowing someone to live can constitute torture. How many
terminal bone cancer patients are quietly helped to die?

Should a truly friendly AI prevent human beings from
engaging in ANY dangerous behavior (including passive
or long term behavior like failure to take vitamins or
overeating), or should it absolutely refuse to interfere
with self-determination to the point of allowing suicide
and other clearly destructive behavior?

Most of us would expect the answer lies somewhere between
the two extremes but few of us could agree where that
line lies and we HAVE HUMAN LEVEL INTELLIGENCE.

We might even find that our answers to this question change
over time or even day to day (and on an absolute basis, i.e.,
separately from the context).

By the way, I believe that we will create friendly AI, but
we will also (eventually) create unfriendly AI, either by
accident or by design.

--
Herb Martin


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:53 MDT