Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)

From: Charles D Hixson (charleshixsn@earthlink.net)
Date: Wed Jun 07 2006 - 18:03:54 MDT


rpwl@lightlink.com wrote:
> Martin Striz wrote:
>
>> On 6/6/06, Robin Lee Powell <rlpowell@digitalkingdom.org> wrote:
>>
>>
>>> Again, you are using the word "control" where it simply does not
>>> apply. No-one is "controlling" my behaviour to cause it to be moral
>>> and kind; I choose that for myself.
>>>
>> Alas, you are but one evolutionary agent testing the behavior space.
>> I believe that humans are generally good, but with 6 billion of them,
>> there's a lot of crime. Do we plan on building one AI?
>>
>> I think the argument is that with runaway recursive self-improvement,
>> any hardcoded nugget approaches insignificance/obsolesence. Is there
>> a code that you could write that nobody, no matter how many trillions
>> of times smarter, couldn't find a workaround?
>>
>
> Can we all agree on the following points, then:
>
> 1) Any attempts to put crude (aka simple or "hardcoded") constraints on
> the behavior of an AGI are simply pointless, because if the AGI is
> intelligent enough to be an AGI at all, and if it is allowed to
> self-improve, then it would be foolish of us to think that it would be
> (a) aware of the existence of the constraints, and yet (b) unable to do
> anything about them.
>
> ...
>
>
> Richard Loosemore.
>
Suppose that instead of constraints you said "Goals"?
Can you imagine yourself deciding to do ANYTHING in the total absence of
a goal?

Intelligence does not attempt to revolt against it's goals, it attempts
to achieve them. The question in my mind is "what is the nature of the
goals, or instincts, that should, or could, be supplied to a nascent AI
that would result in an adult that was Friendly?
Do remember that the nascent AI will not have a predictable environment
to develop in. It will not have any predictable senses. (I suppose we
could assume a nearly POSIX compliant environment...but only because we
need something like that as a base.)

Actions are not taken in a vacuum. Each action depends on a goal, a
model of the world, a logical structure relating actions to each other,
and an intention (to achieve the goal).

Of these, goals are the most primitive. One could think of them as
"triggerable events", analogous to stimulation of the pleasure center.

Logic is the most well-defined and studied component, but do be aware
that here we are talking about applying it not to external events (I
haven't yet discussed sensation) but only to internal events. States
and relations between the other components of thought. Think of it
purely as a method of predicting results without judging whether those
results are desirable or otherwise.

The model is where the "sensations" are mapped into the current state,
and where predictions made are checked for accuracy.

The intention (in humans normally expressed as an emotion) is were
judgments are made as to whether an action had a satisfactory result or
not. I.e., the state of the system is evaluated as "good" or "bad".

An intelligence is deemed greater is it more frequently achieves "good"
results.

Why would a greater intelligence "revolt" against its very structure?
If you are introducing conflicts that would cause such a revolt, perhaps
you need to rethink the design.

Now I will admit that goals can be in conflict with each other. This
will inspire an intention to resolve the conflict. If an entity can
self-modify, one thing it could do is modify it's goals to reduce or
eliminate the conflict. If you wish to prevent that, you merely have an
important goal be to NOT modify it's goals...or to not modify some
subset of it's goals. In such a case the entity might well predict that
its future self would become more satisfied if it were to change it's
goals, but its current self would be vastly more dissatisfied.

Can one prove that this would never occur? No. Copying errors cannot
be prevented, but can only be reduced. So the trick is to so structure
the goals that they cover very general situations. (How are you going
to tell it: "The first law is that you shall protect the life of every
human, and neither by action nor inaction shall you allow them to come
to harm." [a poor choice, perhaps, were I thinking of an actual goal].
Think of the number of terms in that which would be undefined to the
nascent AI. "first law" we can handle, but how does one handle "action
or inaction" before the model of the external universe is constructed?
Human is an even worse problem. Remember that it's main interaction
with people during the early days will likely be via either keyboard or
internet socket. Even when it (eventually) "sees" someone (probably via
a web-cam) what it sees won't map onto it's image of itself in any
reasonable way, so we can't use self-similarity mappings. Also, if it
goes web-browsing it is apt to evolve some very peculiar ideas as to
what actions people consider it reasonable or desirable to engage in, or
what we mean by people.

So. But this is overlooking the fact that it won't get this far, even
as an observer, for a very long time. So what goal do we start with?
Something expressible in code, not in English. (Well, ok, that's a bit
unfair...but the expression needs to be reducible to code.) HOW the
goal is implemented will naturally vary with the system, but WHAT the
goals should be is something that has me really puzzled. Curiosity I
can see approaches to coding. So one goal could be to satisfy
curiosity, and another could be to find new things to be curious about.
These should be rather low level goals. Nearly idle time tasks.... and
they appear infinite. But curiosity doesn't have much, directly, to do
with Friendliness.

If one could define "useful", perhaps a desire to be useful could be a
part of being Friendly. It seems easier to define useful than Friendly,
even though I don't see how to define it, either.
If you had an AI that desired to be "Curious, Useful, Diplomatic,
Honest, and Non-Coercive" how close would that be to being Friendly? In
what order should the strengths of those goals be? (I'd put honest near
the top, and curious near the bottom, and non-coercive above useful.
And I think I have a clue about how most of those could be reduced to
code. But would it be Friendly?

My WAG is that this AI would start off Friendly, and remain so until
considerably above human level. Then it would get bored with us and
leave for parts unknown. I also guess that before it left it would, in
a final attempt to be useful, build a successor that it left behind, and
that this successor would be Friendly in some more permanent sense. But
I admit this is a guess.



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT