Re: A conversation on Friendliness

From: Metaqualia (metaqualia@mynichi.com)
Date: Thu May 27 2004 - 10:33:50 MDT


Why does this stuff need to be labeled as "beyond human comprehension"?

Paraphrasing:

> <James> Nothing terribly communicable. I am wondering if a correct
> implementation of initial state is even generally decidable.

Very hard problem, wondering if there is a solution after all.

> [Eliezer] I don't know your criterion of correctness, or what you mean
by
> decidability. Thus the explanation fails, but it is a noisy failure.

How would you know a good solution?

> <James> I'm having a hard time seeing a way that one can make an
> implementation that is provably safe.

You probably can't prove beyond doubt that a solution is good.

> [Eliezer] In a general sense, you'd start with a well-specified abstract
> invariant, and construct a process that deductively (not
probabilistically)
> obeys the invariant, including as a special case the property of
> constructing further editions of itself that can be deductively proven to
> obey the invariant

Define your goal very specifically and then create the system so that it
will necessarily
and deterministically arrive to that goal. Include a permission for the
system to duplicate itself while maintaining this goal and architecture in
the copy.

> <James> right
> <James> but how do you prove that the invariant constrains expression
> correctly in all cases?

but how do you prove that the goal really is achieved and the system doesn't
drift away?

> [Eliezer] to the extent you have to interact with probabilistic external
> reality, the effect of your actions in the real world is uncertain

Since reality is complex you can't be sure of what is going to happen out
there

> [Eliezer] the only invariant you can maintain by mathematical proof is a
> specification of behaviors in portions of reality that you can control
with
> near determinism, such as your own transistors

the only thing you control is the system's brain itself

> [Eliezer] there's a generalization to maintaing probable failure-safety
> with extremely low probabilities of failure, for redundant unreliable
> components with small individual failure rates

you can build redundant components each one with a very low probability of
failure

> [Eliezer] the tough part of Friendly AI theory is describing a
> mathematical invariant such that if it holds true, the AI is something we
> recognize as Friendly

the tough part is finding out what exactly is the goal

> <James> precisely.
> <James> that's the problem
> [Eliezer] for example, you can have a mathematical invariant that in a
> young AI works to produce smiling humans by doing various humanly
> comprehensible things that make humans happy

for instance if you set the goal to produce smiles

> [Eliezer] in an RSI AI, the same invariant binds to external reality in
a
> way that leads the external state corresponding to a tiny little smiley
> face to be represented with the same high value in the system
> [Eliezer] the AI tiles the universe with little smiley faces

the machine may tile the universe with smiley faces.

> <James> I've been studying it, from a kind of theoretical implementation
> standpoint. Very ugly problem

hmm I thought about it all day and it is very difficult.

> <James> No thoughts yet.

dunno

> [Eliezer] the problem is that humans aren't mathematically
well-specified
> themselves
> [Eliezer] just ad-hoc things that examine themselves and try to come up
> with ill-fitting simplifications
> [Eliezer] we can't transfer our goals into an AI if we don't know what
> they are

we don't really know what the goal is

> <James> Yep. Always have to be aware of that
> [Eliezer] my current thinking tries to cut away at the ill-formedness of
> the problem in two ways

so I am trying to find out in two ways

> [Eliezer] first, by reducing the problem to an invariant in the AI that
> flows through the mathematically poorly specified humans

first make sure that the AI actually knows what the goal is even though
humans don't

> [Eliezer] in other words, the invariant specifies a physical dependency
> on the contents of the human black boxes that reflects what we would
regard
> as the goal content of those boxes

in other words the goal is specified as being the goal content of human
brains (if we knew what that goal content was)

> [Eliezer] second, by saying that the optimization process doesn't try to
> extrapolate the contents of those black boxes beyond the point where the
> chaos in the extrapolation grows too great

second, make sure the AI doesn't try to second guess us if stuff gets too
complex

> [Eliezer] just wait for the humans to grow up, and make of themselves
> what they may

just wait for humans to grow up and blow themselves up on their own

> <James> I've noticed. Seems like a reasonable approach
> <James> Don't know if it is optimal though
> <James> for whatever "optimal" means
> <James> I'm not satisfied that I have a proper grip on the problem yet
> [Eliezer] nor am I
> [Eliezer] there are even parts where I know specifically that my grip is
> slipping

I still don't understand some things

mq



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT