Friendly AI in "Positive Transcension"

From: Eliezer S. Yudkowsky (
Date: Sat Feb 14 2004 - 19:27:23 MST

Ben Goertzel wrote:
> Here is a revised version of my recent futurist essay, improved
> significantly in parts due to the feedback of several readers

Ben, looking over this paper, I would like you to attach a disclaimer that
runs something like this:

"Note: Eliezer Yudkowsky has looked over the sections of this paper
dealing with Friendly AI, and says that not only did I not understand FAI,
but I wasn't even in the remote ballpark - Yudkowsky says he couldn't even
have figured out I was talking about his theory if I hadn't mentioned it
by name, and that all of my representations are absolutely antithetical to
spirit of Friendly AI as he meant it."

No offense is meant. Okay, so you utterly failed to get FAI; so did
everyone else. Anyway, here are some examples of specific problems:

> For Eliezer Yudkowsky, the preservation of human life and human will is
> of almost supreme importance.

What I wish to preserve is sentient life and humane morality. Presently,
humans are the only sentients (I don't think it includes chimps, but I'm
not sure), and it is *only* humans that embody humane morality. Lose the
humans and you lose everything that you value. I know you don't believe
this, but it's why I place the emphasis.

You know how economists are always talking about utility functions? If
you wanted to take a very very VERY rough stab at FAI, it would be
something like this:

1) This is a human.

2) The human being embodies a utility function, preferences,
things-that-output-choices. (No, not really, but we're taking rough stabs

3) This is what the human being's preferences would be, taking the limit
as knowledge approaches a perfect model of reality, and computing power
available goes to infinity. Call this a "volition". The limit is not
even remotely well-defined. On to step 4.

4) Let the volition examine itself. This is the renormalized volition,
or the volition under reflection. Take the limit to logical omniscience
on this too. Now it's even less well-defined then before.

5) Instead of using the renormalized volition from a specific human, use
a "typical" human starting point derived from the evolutionary psychology
specific to the species. This is a "humane" morality. Oh, and the limit?
  It ain't getting any better-defined.

6) Take all the confusion involved in taking the limit of humaneness and
call it "entropy". This is much more impressive.

7) Actually calculate the amount of entropy in the system. Be more
reluctant to guess when the entropy is high.

8) Jump back to systems that are not logically omniscient by
reintroducing probabilism into the calculations. This kind of
probabilistic uncertainty also adds to the entropy. We are now
approximating humaneness.

9) Let an ideally Friendly AI be the AI that would be constructed by a
humane morality.

10) Build an approximately Friendly AI with an architecture that
explicitly treats itself as an approximation to the AI that would be
constructed by a humane morality. If it were me doing the first
approximation, I'd start by guessing that this involved embodying a humane
morality within the FAI itself, i.e., a humane FAI.

I wish these were the old days, so I could look over what I just wrote
with satisfaction, rather than the dreadful sinking knowledge that
everything I just said sounded like complete gibberish to anyone honest
enough to admit it.

Some key derived concepts are these:

a) A Friendly AI improving itself approaches as a limit the AI you'd have
built if you knew what you were doing, *provided that* you *did* know what
you were doing when you defined the limiting process.

b) A Friendly AI does not look old and busted when the civilization that
created it has grown up a few million years. FAIs grow up too - very
rapidly, where the course is obvious (entropy low), in other areas waiting
for the civilization to actually make its choices.

c) If you are in the middle of constructing an FAI, and you make a
mistake about what you really wanted, but you got the fundamental
architecture right, you can say "Oops" and the FAI listens. This is
really really REALLY nontrivial. It requires practically the entire
discipline of FAI to do this one thing.

d) Friendly AI doesn't run on verbal principles or moral philosophies.
If you said to an FAI, "Joyous Growth", its architecture would attempt to
suck out the warm fuzzy feeling that "Joyous Growth" gives you and is the
actual de facto reason you feel fond of "Joyous Growth".

e) The architecture that does this cool stuff is where the technical meat
comes in, and is the interesting part of Friendly AI that goes beyond bull
sessions about what kind of perfect world we'd like. The bull sessions
are useless. You'll know what you want when you know how to do it. All
the really *good* options are phrased in the deep language of FAI.

f) Friendly AI is frickin' complicated, so please stop summarizing it as
"hardwiring benevolence to humans". I ain't bloody Asimov. This isn't
even near the galaxy of the solar system that has the planet where the
ballpark is located.

> He goes even further than most Singularity believers, postulating a
> “hard takeoff” in which a self-modifying AI program moves from
> near-human to superhuman intelligence within hours or minutes – instant
> Singularity!

Ben, you *know* I've read my Tversky and Kahneman. You *know* I'm not
stupid enough to say that as a specific prediction of real-world events.

What I do say is that a hard takeoff looks to be theoretically possible
and is a real practical possibility as well. Moreover, I say that it is
both reasonable, and a necessary exercise in AI craftsmanship, to work as
if a hard takeoff was an immediate possibility at all times - it forces
you to do things that are the right things in any case, and to address
necessary theoretical issues.

> With this in mind, he prioritizes the creation of “Friendly AI’s” –
> artificial intelligence programs with “beneficence to human life”
> programmed in or otherwise inculcated as a primary value.

Absolutely wrong. Ben, you never understood Friendly AI from the
beginning. I say it without malice, because I never understood Novamente
from the beginning, and it took me too many interchanges with you before I
realized it. Communicating about AI is fscking hard. I don't understand
Novamente's parts, design philosophy, or where the work gets done; as far
as I can tell the parts should sit there and brood and not do a damn
thing, which is presumably not the case. If you actually wanted to
communicate to me how Novamente works, it would probably take hands-on
experience with the system, or maybe a university course with textbooks,
experiments, teaching assistants, and experienced professors. That's what
it takes to teach most subjects, and AI is fscking hard. Friendly AI is
two orders of magnitude fscking harder. If there were college courses on
the subject, people still wouldn't get it, because FAI is too fscking
hard. So I mean no malice by saying that you didn't get Friendly AI at all.

The idea of programming the *output* of a human's moral philosophy into an
FAI as a "primary value" is absolutely antithetical to the spirit of
Friendly AI, because you lose all the information and dynamics the human
used to decide that "beneficence to human life" (for example) was even a
good idea to begin with. You therefore lose the content of "beneficence",
the dynamics that would decide what constituted "life", and so on.

> The creation of Friendly AI, he proposes, is the path most likely to
> lead to a human-friendly post-Singularity world.[11]

Smells vaguely of tautology, given definition (9) above.

> This perspective raises a major issue regarding the notion of AI
> Friendliness. Perhaps “Be nice to humans” or “Obey your human masters”
> are simply too concrete and low-level ethical prescriptions to be
> expected to survive the Transcension. Perhaps it’s more reasonable to
> expect highly abstract ethical principles to survive. Perhaps it’s
> more sensible to focus on ensuring the Principle of Voluntary Joyous
> Growth to survive the Transcension, than to focus on specific ethical
> rules (which have meaning only within specific ethical systems, which
> are highly context and culture bound).

FAI does not make specific ethical rules the basis of hardwired
programming. I seem to recall addressing you on this specific point
long since, actually... so for heaven's sake, if you stop nothing else,
please stop this particular misrepresentation.

The problem with expecting "highly abstract ethical principles" to survive
is that even if they only contain 30 bits of Kolmogorov complexity, which
is far too little, the odds of them arising by chance are still a billion
to one. Your instincts give you a totally off estimate of the actual
complexity of your "highly abstract" ethical principles. You say
something like "be beneficent to human life", and your mind recalls Buddha
and Gandhi, and all the cognitive complexity you've ever developed
associated with "beneficence" over a lifetime of living with human
emotions, a human limbic system, a human empathic architecture for
predicting your conspecifics by using yourself as a model, a human
sympathetic architecture for evaluating fairness and other
brainware-supported emotional concepts by putting yourself in other
people's shoes.

> So, my essential complaint against Yudkowsky’s Friendly AI notion is
> that – quite apart from ethical issues regarding the wisdom of using
> mass-energy on humans rather than some other form of existence -- I
> strongly suspect that it’s impossible to create AGI’s that will
> progressively radically self-improve and yet retain belief in the “Be
> nice to and preserve humans” maxim. I think this “Friendly AI”
> principle is just too concrete and too non-universal to survive the
> successive radical-self-improvement process and the Transcension. On
> the other hand, I think a more abstract and universally-attractive
> principle like Voluntary Joyous Growth might well make it.

The core of Friendly AI is the sort of thing you seem to call "abstract",
indeed, far more abstract than "Joyous Growth" (though FAI must become
entirely concrete in my thinking, if it is ever to come into existence).
Yudkowsky's Friendly AI notion does indeed say that you have to understand
how to transfer large amounts of moral complexity from point A to point B.
  If you do not understand how to transfer "concrete" and "non-universal"
complexity, you will fail, because absolutely everything you want to do
has concrete non-universal complexity in it. "Voluntary Joyous Growth",
as you understand that and would apply it, has kilobits and kilobits of
complexity bound up in it. You're evaluating the simplicity of this
concept using a brain that makes things like empathy and sympathy and
benevolence into emotional primitives that can be chunked and manipulated
as if they were ontologically basic, and they're not. You're estimating
the complexity of things like "Voluntary Joyous Growth" as if it were
three words long, when actually they're words that call up complexity-rich
concepts that key into your entire existing emotional architecture and
have implications "obvious" under that emotional architecture and that
emotional architecture only. Try explaining your Voluntary Joyous Growth
to a !Kung tribesman, who's *got* all your brainware already, and you'll
get a better picture of the complexity inherent in it. Then try
explaining it to a chimpanzee. Then try explaining it to a Motie. And
then maybe you'll be ready to explain it to silicon.

Friendly AI is farging difficult! If you cannot do farging difficult
things, you cannot do Friendly AI! Why is this so bloody hard to explain
to people? Why does everyone expect this to be some kind of cakewalk?
Why does everyone turn around and flee at the merest hint that any kind of
real effort might be involved?

I was guilty of this too, by the way, which is why I'm now so intolerant
of it.

> On the other hand, the Friendly AI principle does not seem to harmonize
> naturally with the evolutionary nature of the universe at all.
> Rather, it seems to contradict a key aspect of the nature of the
> universe -- which is that the old gives way to the new when the time
> has come for this to occur.

And all societies inevitably progress toward Communism.

This sort of thinking is guaranteed to fail. Everyone would like to think
the universe is on their side. It is the naturalistic fallacy committed
with respect to mystical gibberish.

The universe I live in is neither for me nor against me. When I am lucky,
it presents to me an acceptable outcome as an accessible option.

> It’s an interesting question whether speciecide contradicts the
> universal-attractor nature of Compassion. Under the Voluntary Joyous
> Growth principle, it’s not favored to extinguish beings without their
> permission. But if a species wants to annihilate itself, because it
> feels its mass-energy can be used for something better, then it’s
> perfectly Compassionate to allow it to do so.

Yeah, yeah, been there, done that, wrote the damn book, tried to have all
the copies of the damn book burned after I actually figured out what the
hell I was talking about. See for an
example of who Eliezer used to be.

There is no light in this world except that embodied in humanity. Even my
old thoughts of species self-sacrifice were things that only a human would
ever have thought. If you lose the information bound up in humanity, you
lose everything of any value, and it won't ever come back. Everything we
care about is specific to humanity, even as our (rather odd) moral
instincts drive us to argue that it is universal. When I understood that,
I shut up about Shiva-Singularities. See, I even gave it a name, back in
my wild and reckless youth. You've sometimes presumed to behave toward me
in a sage and elderly fashion, Ben, so allow me to share one of the
critical lessons from my own childhood: No, you do not want humanity to
go extinct. Trust me on this, because I've been there, and I know from
experience that it isn't obvious.

> In either case: Hope for the best!

I think I covered this at length a few weeks ago.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:45 MDT