Re: Fundamentals - was RE: Visualizing muddled volitions

From: Eliezer Yudkowsky (
Date: Wed Jun 16 2004 - 14:41:11 MDT

Brent Thomas wrote:

> Answers below as appropriate -- indicated by !!!
> Summary: As far as the system itself, most likely self developed and
> capable of 'changing the world', my only fundamental need/desire/request
> is that it allow me (the current me, not some calculated approximation)
> to act as a 'final judge' and accept or reject any modifications it
> would perform on my person. Change the environment to whatever the
> collective derives as fitting our volition, change pretty much anything
> everyone can agree on...but don't change any sentient without presenting
> the choice and ensuring the sentient is comfortable (to the limit of
> their ability to understand) the choice.

Brent, if it was me designing my own Nice Place To Live, that sort of thing
would always happen as the result of a deliberate action by the sentient,
and there wouldn't *be* any drastically self-modifying actions available
until you got your driver's license for your own source code.

The problem is that once you start designing a Nice Place To Live, you lose
the whole bootstrapping effect of the initial dynamic - you have to get
everything exactly right on your first try, not *approximately* right,
exactly right.

> Its not a genie bottle because
> the 'system' works as you have envisioned...

But systems *don't* work as envisioned. It will be hard enough to get
collective volition to work as envisioned. You're imagining a very
detailed and specific system, meant to serve a particular end, and you're
only imagining the consequences of the system that you see with a day's
work and human-level intelligence. You aren't imagining that the system
runs into an unforeseen circumstance you didn't think of when writing down
the initial design, and then it's unable to adapt.

> only the fundamental
> difference is that no action can be taken to a sentient without their
> consent.

Define me "action". Explaining things to a sentient is an action that
modifies the sentient.

Define me "consent". Define me "sentient". Define them down to the level
of physics. Give me a well-specified predicate that applies to the AI's
internal representation of reality in this and all successor
implementations. Get it all exactly right on the first try, for you will
not let me turn the questions over to a bootstrapping initial dynamic.

> (if they were violent, or otherwise inclined not to be involved
> that is the point of the enclaves and the responsibility of the system
> to provide such space for them to exist as they choose...and imho this
> is no bother or hardship for the the point it can alter the
> environment/bodies/selves of sentients it can also protect them)
> I DO think your collective volition is 'right on' in how the system
> should model and improve itself and the environment...i just must insist
> on the rights of a 'last judge' be in MY hands when it comes to my
> body/self/intellect. And truly, for the capabilities I envision this
> system will develop in a short period, maintaining enclaves and
> providing explanations to whatever level of detail a being requests will
> probably take .0000000001% (or less!) of the systems capability.
> Whats the rush, or the need to impose?

*Probably* none. Don't trust "probably"! I can very easily see
circumstances where the CV-RPOP poofs into existence and correctly,
humanely decides that it wants to get the hell out of Dodge, *now*, upload
everyone and run for it before... I don't know what, but I can see it

There are too many unforeseen consequences of this wish of yours! Don't
imagine everything working exactly as planned. Imagine everyone cursing
the name of Brent Thomas for the next hundred generations, or the next
hundred billion years, because he didn't think of that one important
consequence of his wish. What if I hadn't asked you about the human
infants? Do you think you would have thought of it on your own?

> Protect the FUNDAMENTAL condition
> where a sentient is not to be affected unless they choose to be
> affected...

And everyone curses your name for the next hundred billion years because
the System spends fifteen hours out of every day asking everyone whether
it's okay to rotate the planet another degree of arc. Plus, talking to a
sentient affects them.

> if you consider this deeply enough I'm confident that you
> will agree that you would wish the ability to refuse outside change.

Of course I do. I wish for a lot of things, and I don't trust my native
wishing abilities.

> I do think that most will embrace the change, and the change will be
> better, smarter, faster etc...but retain the ability to choose.

I agree this is a good idea, and I dare not write it into the code.

> Would you like to name nine other things that are so fundamental to
> having an acceptable process that it should be a basic condition? If
> you can't, I'm sure nine other people would be happy to do so. Al-Qaeda
> thinks that basing the AI on the Koran is so fundamental to having an
> acceptable process that it should be a basic condition.
> !!! NO - there is only one fundamental thing...ASK before modification
> and respect the answer. There is no need for other fundamentals in a
> friendly system operating from our collective volition.

Yeah, but different people seem to have widely different opinions for the
One Fundamental Thing.

> Including human infants, I assume. I'll expect you to deliver the
> exact, eternal, unalterable specification of what constitutes a
> "sentient" by Thursday. Whatever happened to keeping things simple?
> !!! I'll deliver it today...any being that the system can communicate
> with and that is capable of responding.

That's not well-specified. Specifying it well would be an independent
project of scope comparable to well-specifying the set of transforms and
associated order of evaluation associated with "knew more, thought faster".

As stated, your definition applies to ELIZA.

> The system should be able to
> communicate with any human (in any modality), and (when!) we encounter
> alien sentients they should not be 'modified' before we are capable of
> communicating with them ;-) By responding I mean that the system must
> explain what modification it is intending to make and allow an informed
> choice.

"Explaining" intrinsically implies taking actions that modify the sentient
toward a goal state defined as understanding a given set of beliefs. If
you didn't bootstrap this from an initial dynamic, and got it wrong when
you wrote it down as an unalterable invariant, you could easily overwrite
everyone in the solar system with minds that deeply understood the
consequences of getting a glass of water.

What you want to do is *dangerous*. Extremely dangerous. It doesn't
matter if you think it's a great idea morally, it doesn't get any less
extremely dangerous.

> If the system is unable to clearly (to that target) explain why
> the modification is necessary it should not perform it.

Define me "clearly". Define it down to the level of a predicate that
operates over arbitrary atomic configurations of matter. Get it exactly
right on the first try.

By restraining the initial dynamic with an external system, you're losing
the bootstrapping capability of the initial dynamic with respect to that
external system. It's not an unsolvable problem, but I'd have to solve it
with something resembling a Brent Thomas volition-extrapolating system that
asks how you would define "clearly" if you knew more, thought faster etc.
And at that point, why should I turn it over to you, instead of the planet?
  Why not ask the same volition whether the whole thing is a good idea in
the first place? Do I ask your volition to produce a definition of
"clearly" even if your volition thinks the whole thing is an awful idea?
What if your volition returns blank code?

> If the system
> needs (for some reason determined by the collective volition) to modify
> a sentient and the system cannot communicate with it then the sentient
> should be 'enclaved'

Define "enclaved", down to the level etc.

> if necessary until the system is able to
> explain...dont modify without permission anything capable of
> giving/rejecting permission. Pretty simple.

The light now leaving the Simple constellation will not shine on this
project proposal for millions of years.

> For this particular example human infants should generally not need to
> be modified by the system unless their parent wishes them to be

Good heavens.

Define me "infant". Define me "parent". Is a retarded adult a child?
What if the original father is an unknown bum in Ohio and the child was
adopted? What if the entire human species ends up fitting your definition
of children? How is this unalterable rule going to work a billion years
from now?

> (and I
> do think we GIVE the RIGHT to modify infants to human parents
> today...nothing really new here).

Socially, no. But oh the can of worms, if the FAI programmer has to
personally write the definitions.

> In this instance the infants are only
> potential sentients as they are not capable of responding.

You're applying common sense, and that's a good thing. But your common
sense won't go into code so easily, nor apply to the next million years.
There is a way to amplify common sense so that it can apply with that kind
of precision; it is called "extrapolating a volition", your common sense if
you knew more, thought faster, saw more consequences, could more precisely
specify what you wanted. But why should I extrapolate just this idea of
Brent Thomas's? Why not turn over the question to a collective volition?

> The system
> truly isn't a genie because the collective volition (so I believe) will
> not see the need to modify infants (whats so urgent they need to be
> modified anyway? I don't forsee any condition where the system could not
> enclave them until they develop enough to communicate)

Enclave them away from their parents, away from anyone else who might
modify them? Enclave them with all the mass murderers who opted out, or
force their parents to stay with them? You're rattling off bright ideas
with unintended consequences one after the other, and you might be able to
patch the proposal to where it's not obviously wrong to me, but you can't
patch the proposal to where it's really genuinely right.

All of this is showing up in your proposal after I questioned it; I didn't
hear this originally. You cannot hardcode this sort of thing!

> Could you please elaborate further on all the independent details you
> would like to code into eternal, unalterable invariants? If you add
> enough of
> them we can drive the probability of them all working as expected down
> to effectively zero. Three should be sufficient, but redundancy is
> always a good thing.
> !!! Sure, just one detail --- don't modify a sentient without
> permission, when modification is projected according to the collective
> volition explain process until sentient grasps concept and only proceed
> if accepted. Pretty straight forward.

Straightforward like the design of the human brain is straightforward.

>> Do this and I think the vision of the coming singularity will be more
>> palatable for all humanity.
> It's not about public relations, it's about living with the actual
> result for the next ten billion years if that wonderful PR invariant
> turns out to be a bad idea.
> !!! First it is about public relations (initially) or else your efforts
> a FAI may be stomped by the establishment and foom! Some non F ai will
> be developed...into the razor blades behoves us to make the
> approach palatable to humanity.

Not by hacking the Collective Volition so that it goes wrong. If this was
about PR, I would never have raised the issue of Friendliness in the first
place - once you raise the issue, people argue with you about it. It would
be much more clever, if PR were the problem, to avoid discussing the issue,
so no one would take it seriously.

And furthermore, even if the system ends up being temporary, our first days
will affect the next billion years. I flatly refuse to lend myself to
playing PR games with the actual code. If people have just concerns I will
try to address them. Aside from that, forget it.

> Not under your system, no. I would like to allow your grownup self
> and/or your volition to object effectively.
> !!! Sorry...the decision is MINE...and there is no rush...even if the
> projected volition is correct and my future self will have wanted that I
> still require the CHOICE - maybe it would be better if I were to follow
> the recommendation but life is a journey and I don't want to skip
> ahead...The system should present the option and respect the decision.

Maybe there's a better way to keep the decision yours. Maybe it is ruled
out by having the system explain everything to you in advance. Maybe the
process of explanation causes you to argue elaborately and thereby depart
from the path you would have taken. Maybe knowing in advance takes all the
fun out of it. If so, the mistake is *hardcoded* - there is no way to
revoke it, even if it ends up being awkward and unnecessary, even if it
destroys you.

> I suppose that if that is the sort of solution you would come up with
> after thinking about it for a few years, it might be the secondary
> dynamic. For myself I would argue against that, because it sounds like
> individuals have been handed genie bottles with warning labels, and I
> don't think that's a good thing.
> !!!but that's exactly the point...if you don't think it's a good thing,
> well that doesn't matter to me... I am the one who has to choose. And
> remember this is only in respect to modifications the system DECIDES to
> MAKE to me...
> How it reacts to 'wishes' (ala genie) is a whole nother
> discussion...this fundamental application of CHOICE is only to things
> the system decides it needs to do and in the process must CHANGE
> that point I get to choose.

It's a powerful argument. And I admit that some of my opposition is simply
because I once thought that individual choice should be absolutely
sovereign, and then I learned more, thought longer, and realized that there
are things I value more than the absolute sovereignty of the individual.
That I want infants to grow into humans, if nothing else, would break
autonomy as an absolute principle.

But that ends up being irrelevant, because the other problems are enough to
torpedo the proposal.

> !!! Again...there is only one fundamental thing here

There's about twenty fundamental things here.

> ...insofar as the
> system decides it needs to modify me it must first obtain
> permission...thats it. Pretty basic and no trade off required...remember
> this applies only to modifications to my person/self/intellect that the
> system deems necessary. This control (if I have any say about it and
> that's the basic point isnt it?) must not be surrendered. And there are
> no circumstances that I can forsee (with my limited 2004 intellect
> yes...but that is the sentient being asked to make a choice) that cannot
> wait, be fully explained, and abide by my choice.

Good heavens, how hard have you tried to foresee? Want to bet twenty
dollars that two years from today, you will disagree with at least one
confident assertion in your emails?

> The collective
> volition guides the systems of the universe as it should...I just
> reserve the right to say 'no' as it regards my
> self/intellect/personality.

So now you're identifying your *intellect* or *personality* as the critical
thing to safeguard? So it'd be okay to transport you to an alternate
dimension based on a hentai anime, as long as the essential *you* wasn't
altered? What kind of consequences do you need to understand, in how much
detail, before you can give informed consent? Does the system spend ten
hours explaining to you the exact neurological damage every time you drink
a glass of alcohol? If you say that this doesn't need to happen because it
happens as the result of your own actions, is it okay to leave a cake
labeled "eat me" at your front door that oh incidentally raises your IQ 20
points? Endless can of worms, here.

The whole point of having an *initial dynamic* is that it compresses down
the problem to where it can consist of a small number of identifiable
technical problems that can be satisfactorily solved to bootstrap the
process. By trying to add a Bill of Inalienable Rights to the initial
dynamic, you're losing that, and presumably ruling out large numbers of
scenarios where we want something entirely different to which a Bill of
Inalienable Rights is orthogonal. It constitutes taking over the world.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT