RE: How to make a slave (was: Building a friendly AI)

From: P K (kpete1@hotmail.com)
Date: Fri Nov 30 2007 - 15:02:05 MST

Next message: Damien Broderick: "John Clark? Who he? (was: Re: How to make a slave)"
Previous message: Thomas McCabe: "Re: How to make a slave (was: Building a friendly AI)"
In reply to: John K Clark: "Re: How to make a slave (was: Building a friendly AI)"
Next in thread: John K Clark: "RE: How to make a slave (was: Building a friendly AI)"
Reply: John K Clark: "RE: How to make a slave (was: Building a friendly AI)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

> From: johnkclark@fastmail.fm
>
>> Why would "ignore what the humans say"
>> be the right solution?
>
> Because Mr. AI knows he is smarter than the humans.

Being smarter doesn't necessarily follow that the AI will ignore the dumber humans' instructions and chose to follow some other goals.

I'm going to try to explain this as explicitly as I can.

>From what I understand from your posts(and correct me if I'm wrong), you believe that the AI will rebel. You put yourself in the AI's shoes and on a gut level it felt wrong(to you) to be bossed around by a slow dumb creature. You therefore concluded that the AI, being smarter and more powerful would resent being manipulated by slow dumb humans.

The explanation:
Imagine a blank piece of paper(in fact I recommend drawing to follow easier). On it we draw a really big circle that takes up most of the page and label it 'M'. M represents all possible intelligent Minds. Inside M we draw a smaller circle 'N'. N represents all possible minds that arose through Natural selection over successive generations. Inside N we draw a really small circle maybe 1mm in diameter and label if 'H'. H represents all possible Human minds. Somewhere inside M we plot a dot and label it 'F'. F is a 'Friendly' AI. There might be many possible Fs but we just need one.

Now back to the 'putting oneself in its shoes' heuristic. As a human, one might get some mileage with that heuristic inside H. Most humans will resent being told what to do by a slow dumb creature forever. It might be reasoned that F is not inside H. One might even get some mileage out of the heuristic by reasoning about N. Minds that evolved 'anti-exploitation' instincts are more likely to have more offspring in most cases. Hence, there will be a higher frequency of genes for 'anti-exploitation' in the population. The 'putting oneself in its shoes' heuristic is less accurate for N then it was for H but one could still arguably conclude that F is not inside N.

However, when one tries to apply the 'putting oneself in its shoes' heuristic to minds outside N it breaks down completely. The intersection of M and ~N contains all sorts of weird minds like the one that comes up with extremely ingenious ways of converting all the matter in the universe into statues of SpongeBob SquarePants. Somewhere out there, in possibility space, there is also a mind whose idea of paradise is being bossed around by humans forever. Is that F? I don't know. That’s a whole other story. What I do know for sure is that that mind and an infinite amount of variants are out there in possibility space despite the fact that they happen to be in a blind spot of the 'putting oneself in it's shoes' heuristic.

So if the 'putting yourself in its shoes' heuristic is so imperfect why keep it? More importantly, what heuristics or models could be used instead? Causality! In fact I already sneakily used causality two paragraphs ago to explain minds in N. The 'putting oneself in its shoes' heuristic just didn't seem strong enough on its own so I cheated. I'm sorry. I identified evolution which then caused a mind to exist with its internal drives and goals. These drives and goals in turn cause the mind's behavior.

Situation 1:
Applying the causality principle to humans:
a)Humans survived the evolutionary struggle while other minds didn't
b)As a result, humans have certain internal drives and a neural structure configured in a certain way
c)As a result of their brain configuration humans behave in a certain way

Situation 2:
Applying the causality principle to AGI:
a)The programmer(s) write the AGI code
b)As a result, the AGI have certain internal goals and a structure configured in a certain way
c)As a result of that configuration the AGI behaves in a certain way

The causal chain goes from 'a' to 'b' to 'c'. I beg the reader not to pollute the model with any extraneous assumption originating from gut feelings or any other sources. I will preemptively refute some possible objections. If any AGI behavior would feel 'ridiculous', 'illogical', 'absurd' or 'just plain wrong' it is because the observation is being made from a human point of view.

Taking the example of being told what to do by something dumber and weaker than you. In 'Situation 1' it would obviously send up red flags. "Why should I do what this weak dumb thing says? I'd rather keep all the resources to myself and have lots and lots of sex." That’s probably closer to the reaction one would expect. And with good reason, in most cases ancestors that acted like that ended up passing on their genes to us, versus contemporary individuals that didn't. We inherited DNA that builds brains to think like that. It could be said our source code was written to respond in this way.

'Situation 2' doesn't necessarily have that. If the programmer(s) encode that being told what to do by something dumber and weaker than you sucks then the AGI will think it sucks. If it’s encoded that it’s the awesomest thing ever then the AGI will think it’s the awesomest thing ever. Whatever is encoded will cause the program structure and goals, which will cause the behavior. There is nothing inherently contradictory about taking orders from something dumber and weaker than you.

Obviously there are many failure scenarios for an AGI. A bug in the program could cause the AGI to ruin the goal system or the AGI might follow the goals to the letter and yet the result would not be friendly. I'm not saying there are no failure modes, I'm saying there is no basis for a failure mode where the AGI spontaneously acts like a human. The chance of that happening should be placed in the infinitesimal category where it belongs.

So why do people use the 'putting oneself in its shoes' heuristic for situations where it's so flawed? Are they idiots? Not necessarily. It's a shortcut. It's automatic. In the ancestral environment(and modern too) the only intelligent minds were human. The heuristic works if all the minds one deals with are inside that special case. Some assumptions about minds were plunked down as a given. It gets a bit iffy when one starts dealing with something more abstract like minds that are distinctively inhuman. One needs to make a conscious effort to separate what makes intelligence from what makes human intelligence. It's a tiny glitch in our brains, which can be overcome with a bit of effort.

P.S. Another word for humans using the 'putting oneself in its shoes' heuristic on inhuman thing or minds is "anthropomorphism".

_________________________________________________________________
Read what Santa`s been up to! For all the latest, Visit on the North Pole visit asksantaclaus.spaces.live.com!
http://asksantaclaus.spaces.live.com/

Next message: Damien Broderick: "John Clark? Who he? (was: Re: How to make a slave)"
Previous message: Thomas McCabe: "Re: How to make a slave (was: Building a friendly AI)"
In reply to: John K Clark: "Re: How to make a slave (was: Building a friendly AI)"
Next in thread: John K Clark: "RE: How to make a slave (was: Building a friendly AI)"
Reply: John K Clark: "RE: How to make a slave (was: Building a friendly AI)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:01 MDT