RE: Friendly AI in "Positive Transcension"

From: Ben Goertzel (
Date: Sun Feb 15 2004 - 11:19:14 MST


I've lost taste for lengthy point-by-point email arguments, but I'll respond
to a few of your major themes and points.

Before responding at all I'm going to make a request of you. Please

FIRST, in a single, clear, not-too-long sentence

SECOND, in a single, clear, not-too-long paragraph

your own view of your theory of Friendly AI. I will then paraphrase your
own summaries, in my own words, in my revised essay. I have no desire to
misrepresent your ideas, of course.

I'm sure that I don't fully understand your ideas and intentions when you
wrote CFAI, nor your current intentions and ideas. However, I have read
them and talked to you about them more than once. The problem is not so
much that I'm not aware of the details, but that I think the details are
consistent with certain summaries that you think they're inconsistent with

The summary you gave in your response email was almost clear, but not

In line with your request, I will add a brief disclaimer to the essay noting
that you feel I didn't correctly interpret your ideas, and that the reader
should turn to your writings directly to form their own opinion.

Next, to clarify one small point, you say

> f) Friendly AI is frickin' complicated, so please stop summarizing it as
> "hardwiring benevolence to humans". I ain't bloody Asimov.

In fact what I said was "programming or otherwise inculcating" benevolence
to humans -- i.e. by saying "inculcating" I meant to encompass teaching not
just programming, or combinations of teaching and programming, etc.

So far as I can tell, the biggest difference you see between my rendition of
your views, and your actual views, is that instead of "programming or
otherwise inculcating benevolence to humans", you'd rather speak about
"programming or otherwise inculcating humane morality in an AI". And you
consider this an approximation for "an architecture that explicitly treats
itself as an approximation to the AI that would be constructed by a humane
morality." I can see the difference you're pointing out but I don't see it
as such a big difference -- I guess it all depends on how you ground the
term "benevolence." One could fairly interpret "benevolence to humans" as
"acting toward humans in accordance with humane morality." In that case,
what my formulation misses is mainly that you want an AI that acts toward
other things with humane morality as well, not just toward humans.

I still have the basic complaint that "humane morality" is a very narrow
thing to be projecting throughout the cosmos. It's also a rather funky and
uncomfortable abstraction, given the immense diversity of human ethical
systems throughout history and across the globe.

I do see your point that my more abstract concepts like growth, choice and
joy are grounded -- in my mind -- in a lot of human thoughts and feelings.
Very true, very deep. But that doesn't mean that I can't pick and choose
from among the vast contradictory morass of "humane morality," certain
aspects that I think are worthy of projecting across the cosmos, because
they have more fundamental importance than the other aspects, less
narrowness of meaning.

Now, on to a couple nitpicks (I could give many more, but lack the

At one point in your message you seem to be accusing me of thinking the
creation of ethically-positive superhuman AI systems is an "easy" problem.
Of course, that is not the case. I explicitly state that I consider it a
very difficult problem that requires a huge amount of research -- and in
fact, I even wonder whether it will be necessary to create a
relatively-technologically-static global police state in order to allow this
research to go on systematically and in peace!

I guess you badly misunderstood some of the more philosophical points in my
essay. For instance, you respond to something or other I said by telling me
that "the universe is not on [my] side" -- but I certainly didn't mean to
say anything implying that the universe is "on my side" in any reasonable

-- Ben G

> -----Original Message-----
> From: []On Behalf Of Eliezer
> S. Yudkowsky
> Sent: Saturday, February 14, 2004 9:27 PM
> To:
> Subject: Friendly AI in "Positive Transcension"
> Ben Goertzel wrote:
> >
> > Here is a revised version of my recent futurist essay, improved
> > significantly in parts due to the feedback of several readers
> >
> >
> Ben, looking over this paper, I would like you to attach a disclaimer that
> runs something like this:
> "Note: Eliezer Yudkowsky has looked over the sections of this paper
> dealing with Friendly AI, and says that not only did I not understand FAI,
> but I wasn't even in the remote ballpark - Yudkowsky says he couldn't even
> have figured out I was talking about his theory if I hadn't mentioned it
> by name, and that all of my representations are absolutely antithetical to
> spirit of Friendly AI as he meant it."
> No offense is meant. Okay, so you utterly failed to get FAI; so did
> everyone else. Anyway, here are some examples of specific problems:
> > For Eliezer Yudkowsky, the preservation of human life and human will is
> > of almost supreme importance.
> What I wish to preserve is sentient life and humane morality. Presently,
> humans are the only sentients (I don't think it includes chimps, but I'm
> not sure), and it is *only* humans that embody humane morality. Lose the
> humans and you lose everything that you value. I know you don't believe
> this, but it's why I place the emphasis.
> You know how economists are always talking about utility functions? If
> you wanted to take a very very VERY rough stab at FAI, it would be
> something like this:
> 1) This is a human.
> 2) The human being embodies a utility function, preferences,
> things-that-output-choices. (No, not really, but we're taking
> rough stabs
> here.)
> 3) This is what the human being's preferences would be, taking the limit
> as knowledge approaches a perfect model of reality, and computing power
> available goes to infinity. Call this a "volition". The limit is not
> even remotely well-defined. On to step 4.
> 4) Let the volition examine itself. This is the renormalized volition,
> or the volition under reflection. Take the limit to logical omniscience
> on this too. Now it's even less well-defined then before.
> 5) Instead of using the renormalized volition from a specific human, use
> a "typical" human starting point derived from the evolutionary psychology
> specific to the species. This is a "humane" morality. Oh, and
> the limit?
> It ain't getting any better-defined.
> 6) Take all the confusion involved in taking the limit of humaneness and
> call it "entropy". This is much more impressive.
> 7) Actually calculate the amount of entropy in the system. Be more
> reluctant to guess when the entropy is high.
> 8) Jump back to systems that are not logically omniscient by
> reintroducing probabilism into the calculations. This kind of
> probabilistic uncertainty also adds to the entropy. We are now
> approximating humaneness.
> 9) Let an ideally Friendly AI be the AI that would be constructed by a
> humane morality.
> 10) Build an approximately Friendly AI with an architecture that
> explicitly treats itself as an approximation to the AI that would be
> constructed by a humane morality. If it were me doing the first
> approximation, I'd start by guessing that this involved embodying
> a humane
> morality within the FAI itself, i.e., a humane FAI.
> I wish these were the old days, so I could look over what I just wrote
> with satisfaction, rather than the dreadful sinking knowledge that
> everything I just said sounded like complete gibberish to anyone honest
> enough to admit it.
> Some key derived concepts are these:
> a) A Friendly AI improving itself approaches as a limit the AI
> you'd have
> built if you knew what you were doing, *provided that* you *did*
> know what
> you were doing when you defined the limiting process.
> b) A Friendly AI does not look old and busted when the civilization that
> created it has grown up a few million years. FAIs grow up too - very
> rapidly, where the course is obvious (entropy low), in other
> areas waiting
> for the civilization to actually make its choices.
> c) If you are in the middle of constructing an FAI, and you make a
> mistake about what you really wanted, but you got the fundamental
> architecture right, you can say "Oops" and the FAI listens. This is
> really really REALLY nontrivial. It requires practically the entire
> discipline of FAI to do this one thing.
> d) Friendly AI doesn't run on verbal principles or moral philosophies.
> If you said to an FAI, "Joyous Growth", its architecture would attempt to
> suck out the warm fuzzy feeling that "Joyous Growth" gives you and is the
> actual de facto reason you feel fond of "Joyous Growth".
> e) The architecture that does this cool stuff is where the
> technical meat
> comes in, and is the interesting part of Friendly AI that goes
> beyond bull
> sessions about what kind of perfect world we'd like. The bull sessions
> are useless. You'll know what you want when you know how to do it. All
> the really *good* options are phrased in the deep language of FAI.
> f) Friendly AI is frickin' complicated, so please stop summarizing it as
> "hardwiring benevolence to humans". I ain't bloody Asimov. This isn't
> even near the galaxy of the solar system that has the planet where the
> ballpark is located.
> > He goes even further than most Singularity believers, postulating a
> > “hard takeoff” in which a self-modifying AI program moves from
> > near-human to superhuman intelligence within hours or minutes – instant
> > Singularity!
> Ben, you *know* I've read my Tversky and Kahneman. You *know* I'm not
> stupid enough to say that as a specific prediction of real-world events.
> What I do say is that a hard takeoff looks to be theoretically possible
> and is a real practical possibility as well. Moreover, I say that it is
> both reasonable, and a necessary exercise in AI craftsmanship, to work as
> if a hard takeoff was an immediate possibility at all times - it forces
> you to do things that are the right things in any case, and to address
> necessary theoretical issues.
> > With this in mind, he prioritizes the creation of “Friendly AI’s” –
> > artificial intelligence programs with “beneficence to human life”
> > programmed in or otherwise inculcated as a primary value.
> Absolutely wrong. Ben, you never understood Friendly AI from the
> beginning. I say it without malice, because I never understood Novamente
> from the beginning, and it took me too many interchanges with you before I
> realized it. Communicating about AI is fscking hard. I don't understand
> Novamente's parts, design philosophy, or where the work gets done; as far
> as I can tell the parts should sit there and brood and not do a damn
> thing, which is presumably not the case. If you actually wanted to
> communicate to me how Novamente works, it would probably take hands-on
> experience with the system, or maybe a university course with textbooks,
> experiments, teaching assistants, and experienced professors. That's what
> it takes to teach most subjects, and AI is fscking hard. Friendly AI is
> two orders of magnitude fscking harder. If there were college courses on
> the subject, people still wouldn't get it, because FAI is too fscking
> hard. So I mean no malice by saying that you didn't get Friendly
> AI at all.
> The idea of programming the *output* of a human's moral philosophy into an
> FAI as a "primary value" is absolutely antithetical to the spirit of
> Friendly AI, because you lose all the information and dynamics the human
> used to decide that "beneficence to human life" (for example) was even a
> good idea to begin with. You therefore lose the content of "beneficence",
> the dynamics that would decide what constituted "life", and so on.
> > The creation of Friendly AI, he proposes, is the path most likely to
> > lead to a human-friendly post-Singularity world.[11]
> Smells vaguely of tautology, given definition (9) above.
> > This perspective raises a major issue regarding the notion of AI
> > Friendliness. Perhaps “Be nice to humans” or “Obey your human masters”
> > are simply too concrete and low-level ethical prescriptions to be
> > expected to survive the Transcension. Perhaps it’s more reasonable to
> > expect highly abstract ethical principles to survive. Perhaps it’s
> > more sensible to focus on ensuring the Principle of Voluntary Joyous
> > Growth to survive the Transcension, than to focus on specific ethical
> > rules (which have meaning only within specific ethical systems, which
> > are highly context and culture bound).
> FAI does not make specific ethical rules the basis of hardwired
> programming. I seem to recall addressing you on this specific point
> long since, actually... so for heaven's sake, if you stop nothing else,
> please stop this particular misrepresentation.
> The problem with expecting "highly abstract ethical principles" to survive
> is that even if they only contain 30 bits of Kolmogorov complexity, which
> is far too little, the odds of them arising by chance are still a billion
> to one. Your instincts give you a totally off estimate of the actual
> complexity of your "highly abstract" ethical principles. You say
> something like "be beneficent to human life", and your mind recalls Buddha
> and Gandhi, and all the cognitive complexity you've ever developed
> associated with "beneficence" over a lifetime of living with human
> emotions, a human limbic system, a human empathic architecture for
> predicting your conspecifics by using yourself as a model, a human
> sympathetic architecture for evaluating fairness and other
> brainware-supported emotional concepts by putting yourself in other
> people's shoes.
> > So, my essential complaint against Yudkowsky’s Friendly AI notion is
> > that – quite apart from ethical issues regarding the wisdom of using
> > mass-energy on humans rather than some other form of existence -- I
> > strongly suspect that it’s impossible to create AGI’s that will
> > progressively radically self-improve and yet retain belief in the “Be
> > nice to and preserve humans” maxim. I think this “Friendly AI”
> > principle is just too concrete and too non-universal to survive the
> > successive radical-self-improvement process and the Transcension. On
> > the other hand, I think a more abstract and universally-attractive
> > principle like Voluntary Joyous Growth might well make it.
> The core of Friendly AI is the sort of thing you seem to call "abstract",
> indeed, far more abstract than "Joyous Growth" (though FAI must become
> entirely concrete in my thinking, if it is ever to come into existence).
> Yudkowsky's Friendly AI notion does indeed say that you have to
> understand
> how to transfer large amounts of moral complexity from point A to
> point B.
> If you do not understand how to transfer "concrete" and "non-universal"
> complexity, you will fail, because absolutely everything you want to do
> has concrete non-universal complexity in it. "Voluntary Joyous Growth",
> as you understand that and would apply it, has kilobits and kilobits of
> complexity bound up in it. You're evaluating the simplicity of this
> concept using a brain that makes things like empathy and sympathy and
> benevolence into emotional primitives that can be chunked and manipulated
> as if they were ontologically basic, and they're not. You're estimating
> the complexity of things like "Voluntary Joyous Growth" as if it were
> three words long, when actually they're words that call up
> complexity-rich
> concepts that key into your entire existing emotional architecture and
> have implications "obvious" under that emotional architecture and that
> emotional architecture only. Try explaining your Voluntary Joyous Growth
> to a !Kung tribesman, who's *got* all your brainware already, and you'll
> get a better picture of the complexity inherent in it. Then try
> explaining it to a chimpanzee. Then try explaining it to a Motie. And
> then maybe you'll be ready to explain it to silicon.
> Friendly AI is farging difficult! If you cannot do farging difficult
> things, you cannot do Friendly AI! Why is this so bloody hard to explain
> to people? Why does everyone expect this to be some kind of cakewalk?
> Why does everyone turn around and flee at the merest hint that
> any kind of
> real effort might be involved?
> I was guilty of this too, by the way, which is why I'm now so intolerant
> of it.
> > On the other hand, the Friendly AI principle does not seem to harmonize
> > naturally with the evolutionary nature of the universe at all.
> > Rather, it seems to contradict a key aspect of the nature of the
> > universe -- which is that the old gives way to the new when the time
> > has come for this to occur.
> And all societies inevitably progress toward Communism.
> This sort of thinking is guaranteed to fail. Everyone would like
> to think
> the universe is on their side. It is the naturalistic fallacy committed
> with respect to mystical gibberish.
> The universe I live in is neither for me nor against me. When I
> am lucky,
> it presents to me an acceptable outcome as an accessible option.
> > It’s an interesting question whether speciecide contradicts the
> > universal-attractor nature of Compassion. Under the Voluntary Joyous
> > Growth principle, it’s not favored to extinguish beings without their
> > permission. But if a species wants to annihilate itself, because it
> > feels its mass-energy can be used for something better, then it’s
> > perfectly Compassionate to allow it to do so.
> Yeah, yeah, been there, done that, wrote the damn book, tried to have all
> the copies of the damn book burned after I actually figured out what the
> hell I was talking about. See for an
> example of who Eliezer used to be.
> There is no light in this world except that embodied in humanity.
> Even my
> old thoughts of species self-sacrifice were things that only a
> human would
> ever have thought. If you lose the information bound up in humanity, you
> lose everything of any value, and it won't ever come back. Everything we
> care about is specific to humanity, even as our (rather odd) moral
> instincts drive us to argue that it is universal. When I
> understood that,
> I shut up about Shiva-Singularities. See, I even gave it a name, back in
> my wild and reckless youth. You've sometimes presumed to behave
> toward me
> in a sage and elderly fashion, Ben, so allow me to share one of the
> critical lessons from my own childhood: No, you do not want humanity to
> go extinct. Trust me on this, because I've been there, and I know from
> experience that it isn't obvious.
> > In either case: Hope for the best!
> I think I covered this at length a few weeks ago.
> --
> Eliezer S. Yudkowsky
> Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:45 MDT