Re: Friendly AI in "Positive Transcension"

From: Eliezer S. Yudkowsky (
Date: Sun Feb 15 2004 - 12:32:22 MST

Ben Goertzel wrote:
> Before responding at all I'm going to make a request of you. Please
> summarize,
> FIRST, in a single, clear, not-too-long sentence
> SECOND, in a single, clear, not-too-long paragraph
> your own view of your theory of Friendly AI. I will then paraphrase your
> own summaries, in my own words, in my revised essay. I have no desire to
> misrepresent your ideas, of course.

This cannot possibly be done. What you're asking is undoable.

We've already established that I don't understand Novamente. This is a
minor problem. Let's say that instead there was a major problem, which is
that, rather than knowingly failing to comprehend Novamente, I had looked
at Novamente and thought, "Oh, I understand this! This is Cyc, only all
the LISP atoms have activation levels attached to them." Let's moreover
suppose that I have never read "Godel Escher Bach", studied biology, or in
any other way acquired a rich concept for levels of organization, which
you could invoke to explain the point of "maps" or "emergent dynamics".
Instead I write a lengthy paper and many times refer to "Ben Goertzel's
Novamente concept, which is Cyc with activation levels added." In short,
I not only completely fail to get Novamente, I substitute a wildly
different model built out of concepts I already have, which is not even
recognizable as Novamente-related except by a stretch of the imagination.
  Of course, since (in the words of Robyn Dawes) the problem with rotten
chains of reasoning is that they don't stink, I think I've understood
Novamente perfectly, except for some nagging implementational details.

Now, please first, in a single, clear, not-too-long sentence, and second,
in a single, clear, not-too-long paragraph, summarize, to me, your view of
all the fundamental concepts that went into Novamente - not just for me,
mind you, but for readers completely unfamiliar with your ideas who will
read my interpretation of your ideas.

Friendly AI does not fit on a T-Shirt!

> I'm sure that I don't fully understand your ideas and intentions when you
> wrote CFAI, nor your current intentions and ideas. However, I have read
> them and talked to you about them more than once. The problem is not so
> much that I'm not aware of the details, but that I think the details are
> consistent with certain summaries that you think they're inconsistent with
> ;-)

You are saying things that are not simply wrong, but absolutely
antithetical to basic principles of FAI. I don't think you'd be doing
that if you were just missing a few details.

> The summary you gave in your response email was almost clear, but not
> quite..
> In line with your request, I will add a brief disclaimer to the essay noting
> that you feel I didn't correctly interpret your ideas, and that the reader
> should turn to your writings directly to form their own opinion.

I'll be sure to add a similar disclaimer to my forthcoming essay about
"Novamente: Cyc with activation levels added".

> Next, to clarify one small point, you say
>>f) Friendly AI is frickin' complicated, so please stop summarizing it as
>>"hardwiring benevolence to humans". I ain't bloody Asimov.
> In fact what I said was "programming or otherwise inculcating" benevolence
> to humans -- i.e. by saying "inculcating" I meant to encompass teaching not
> just programming, or combinations of teaching and programming, etc.
> So far as I can tell, the biggest difference you see between my rendition of
> your views, and your actual views, is that instead of "programming or
> otherwise inculcating benevolence to humans", you'd rather speak about
> "programming or otherwise inculcating humane morality in an AI".

You're missing upward of a dozen fundamental concepts here. One or two
examples follow.

First, let's delete "programming or otherwise inculcating" and replacing
with "choosing", which is the correct formulation under the basic theory
of FAI, which makes extensive use of the expected utility principle.
Choice subsumes choice over programming, choice over environmental
information, and any other design options of which we might prefer one to

Next, more importantly, "humane" is not being given its intuitive sense
here! Humane is here a highly technical concept, "renormalized humanity".
  If you talk about "benevolence" you're talking about something that
seems, at least to humans, simple and intuitive. If you use "humane" in
the technical sense I gave it, you are describing a deep technical thing
that is not at all obvious. Furthermore, you are describing a deep
technical thing that incorporates fundamental FAI concepts. "Benevolence"
can be summed up by a constant utility function. "Humaneness" cannot. We
could conceivably speak of "hardwiring" benevolence, or substitute
"programming or otherwise inculcating", without loss of generality. This
is not an idiom that even makes sense for "humaneness", in its newly
acquired sense of dynamic renormalization.

Furthermore, if you say "humaneness" without giving it a technical
definition, it absolutely doesn't help the readers - it simply reads as
equivalent to benevolence.

I doubt that anything I've said about humaneness conveys even a distant
flavor of what it's about, actually - not enough exposition, not enough

That's one of the new principles involved. There are more.

> And you
> consider this an approximation for "an architecture that explicitly treats
> itself as an approximation to the AI that would be constructed by a humane
> morality." I can see the difference you're pointing out but I don't see it
> as such a big difference -- I guess it all depends on how you ground the
> term "benevolence." One could fairly interpret "benevolence to humans" as
> "acting toward humans in accordance with humane morality." In that case,
> what my formulation misses is mainly that you want an AI that acts toward
> other things with humane morality as well, not just toward humans.

These are totally incommensurate formulations - like comparing the answer
"15!" with the question "3 * 5 = ?" or the set theory of fields.

It's not that you're misunderstanding *what specifically* I'm saying, but
that you're misunderstanding the *sort of thing* I'm attempting to
describe. Not apples versus oranges, more like apples versus the equation
x'' = -kx.

> I still have the basic complaint that "humane morality" is a very narrow
> thing to be projecting throughout the cosmos. It's also a rather funky and
> uncomfortable abstraction, given the immense diversity of human ethical
> systems throughout history and across the globe.

I don't think you got "humane" at all, but it happens to be a thing that
(should) include your dynamic reaction of discomfort at the perceived
narrowness in your odd mental model of "Eliezer Yudkowsky's FAI theory".

"Humaneness" sneaks under the immense diversity problem by avoiding the
specific content of said human ethical systems and going after the
species-universal evolved dynamics underlying them.

Your most serious obstacle here is your inability to see anything except
the specific content of an ethical system - you see "Joyous Growth" as a
specific ethical system, you see "benevolence" as specific content, your
mental model of "humaneness" is something-or-other with specific ethical
content. "Humaneness" as I'm describing it *produces* specific ethical
content but *is not composed of* specific ethical content. Imagine the
warm fuzzy feeling that you get when considering "Joyous Growth". Now,
throughout history and across the globe, do you think that only
21st-century Americans get warm fuzzy feelings when considering their
personal moral philosophies?

There actually is a strong analogy here between attempts to infuse lists
of domain-specific knowledge into AIs, a la Cyc, and attempting to produce
AIs that have specific cognitive dynamics which can output what we would
regard as general reasoning.

> I do see your point that my more abstract concepts like growth, choice and
> joy are grounded -- in my mind -- in a lot of human thoughts and feelings.
> Very true, very deep. But that doesn't mean that I can't pick and choose
> from among the vast contradictory morass of "humane morality," certain
> aspects that I think are worthy of projecting across the cosmos, because
> they have more fundamental importance than the other aspects, less
> narrowness of meaning.

The dynamics of the thinking you do when you consider that question would
form part of the "renormalization" step, step 4, the volition examining
itself under reflection. It is improper to speak of a vast morass of
"humane morality" which needs to be renormalized, because the word
"humane" was not introduced until after step 4. You could speak of a vast
contradictory morass of the summated outputs of human moralities, but if
you add the "e" on the end, then in FAI theory it has the connotation of
something already renormalized. Furthermore, it is improper to speak of
renormalizing the vast contradictory morass as such, because it's a
superposition of outputs, not a dynamic process capable of renormalizing
itself. You can speak of renormalizing a given individual, or
renormalizing a model based on a typical individual.

This is all already taken into account in FAI theory. At length.

(PS: Please stop quoting the entire message below your replies! This is
explicitly not-a-good-thing according to the SL4 list rules.)

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:45 MDT