RE: guaranteeing friendliness

From: Herb Martin (HerbM@LearnQuick.Com)
Date: Sat Dec 03 2005 - 23:27:32 MST


From: Eliezer S. Yudkowsky
>
> Herb Martin wrote:
> > From: Michael Wilson
> >>
> >>You're making a beginner mistake: you're confusing the ability
> >>to predict what an intelligence will /do/, with the ability
> >>to predict what it will /desire/. If we could predict exactly
> >>what an AGI will actually do then it wouldn't have transhuman
> >>intelligence. Fortunately predicting what the goals of an
> >>AGI system will be, including the effects of self-modification,
> >>is a much more tractable (though very hard) endeavour.

<snip my own paragraph in reply>

> I agree that Wilson's paragraph is not a sufficient explanation.
>
> Consider the concept of an optimization process: a system which hits
> small targets in large search spaces to produce coherent
> real-world effects.

> An optimization process steers the future into particular
> regions of the
> possible. I am visiting a distant city, and a local friend
> volunteers
> to drive me to the airport. I do not know the neighborhood. When my
> friend comes to a street intersection, I am at a loss to predict my
> friend's turns, either individually or in sequence. Yet I
> can predict
> the result of my friend's unpredictable actions: we will
> arrive at the airport.

And here you have already made the assumption that your
"friend" is indeed friendly, and that (implicitly) your
friend is capable of finding the airport, and perhaps
more of a practical issue: Your friend understands the
traffic patterns and your scheduled flight time well
enough to deliver you in time for some particular flight.

Were you to see a road sign, Airport Next Right, and
your friend were to bear left, you might begin to wonder
and even question your friends abilities.

You presume you friend will deliver you on time for some
flight, but you are willing to revise this as additional
evidence arrives.

I have literally had "friends" who felt that any trip
involved stopping somewhere along the route to "have
a beer", and riding with them to the airport was not
without trepidation (on several accounts.)

> Even if my friend's house were located elsewhere in
> the city,
> so that my friend made a wholly different sequence of turns, I would
> just as confidently predict our destination. Is this not a strange
> situation to be in, scientifically speaking? I can predict
> the outcome
> of a process, without being able to predict any of the intermediate
> steps in the process.

Yes, but do notice that your prediction is not guaranteed
but only a statistical prediction based on your level of
belief in your friend's "friendliness" and more practically
your friend's capabilities to arrive at the airport and
on time.

We do this a thousand times a day -- we predict the outcome
of our own trips, to the mail box, the grocery store, to the
kitchen even, without knowing ahead of time if some event
will cause us to stop, detour, or engage in some intermediate
behavior (e.g., feed the cat, go around a real roadway detour
or traffic accident.)

But we can NEVER KNOW for certain that we will arrive at
that location a priori.

We can make our best effort, and the further the goal, the
fewer times we have "made the trip" the more doubt will be
included in our prediction.

I am pretty sure I can make it to the kitchen, even the
local grocery store, but last week I took a wrong turn
(literally) on the way to the airport and got caught in a
local "maximum":

I could see the San Diega Airport control tower but had
arrived on the north end of the airport from which there is
no clear path to the passenger terminal.

Of course, I KNEW that I would get to the airport, but my
conviction that it would be in time for my flight was
seriously diminished until I was able to return to the
(main) path, highligted by signs such as "Rental Car Return."

[I made the airport and the flight in time, but I would not
have bet $1,000,000 on success during a period of several
minutes.]

We must be very careful about betting our lives, our futures,
and those of our loved ones on a "guarantee" of success.

Again, don't confuse me with someone who says "FAI cannot be
done" but rather that it is going to be very near impossible
to know ahead of time if we will succeed in the friendly
portion on our first (few) rampant attempts.

> Consider a car, say a Toyota Corolla. Of all possible configurations
> for the atoms making up the Corolla, only an infinitesimal fraction
> qualify as a useful working car. If you assembled molecules
> at random,
> many many ages of the universe would pass before you hit on a car. A
> tiny fraction of the design space does describe vehicles that
> we would
> recognize as faster, more efficient, and safer than the
> Corolla. Thus
> the Corolla is not optimal under the designer's goals. The
> Corolla is,
> however, optimized, because the designer had to hit a comparatively
> infinitesimal target in design space just to create a working
> car, let
> alone a car of the Corolla's quality. You cannot build so much as an
> effective wagon by sawing boards randomly and nailing according to
> coinflips. To hit such a tiny target in configuration space
> requires a powerful optimization process.

Yes, and we are reasonably certain that future optimization
attempts will be able to create both direct aerodynamic
simulations (to the extent this has not already been done)
which will improve the Corolla OR be able to construct
replacements using nano (or other technology) that will
greatly improve this current/local maximum.

Once the car can be optimized, as it were, in 'software',
the all (or almost) all bets are off about the final
form of the 'Corolla'.

It may fly, it may not have a human driver, it may use
carbon fiber structural components, a new propulsion
system, or a host of other (within the design space)
improvements.

Note: Some of these, e.g., flying car, might be considered
"unfriendly" to the current FAA flight control system.
 

> This strange balance sometimes confuses people, so that they conflate
> creativity with randomness and begin to praise entropy.

Just as an aside (not relevant in this discussion): I have
seen methods of TEACHING creativity which involve explicitly
showing real human beings how to engage (semi) random
processess to find new ideas for evaluation.

These methods are highly effective; and some people who
are known for creativity were actually modeled to produce
the technique (i.e., some creative people use this
technique both implicitly and explicitly.)

--
Herb Martin


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:54 MDT