RE: Goertzel's _PtS_

From: Ben Goertzel (
Date: Wed May 02 2001 - 19:52:53 MDT


> > the notion of Friendly AI the creation of AI systems that, as they
> > rewrite their own source code achieving progressively greater and
> > greater intelligence, leave invariant the portion of their code
> > requiring them to be friendly to human beings
> No offense - but no, *no*, NO, *NO*!

Clearly we still have a lot to talk about here. Sorry if I misrepresented
your views, it wasn't intentional.

I don't have time to carry out this argument in 100% adequate detail right
now, because I'm going to Norway tomorrow to beg for $$ from some VC's there
;> But hopefully over the weekend I'll find time to write that essay on
the logic of Friendliness that I keep wanting to write, which will explain
what I mean by "invariants" and so forth...

Anyway, I'll make a few comments...

> Point the 1st: Friendliness is not, and cannot, be implemented on the
> level of source code. Friendliness is cognitive content.

Sure, but source code can bias the system in favor of certain cognitive

> Point the 2nd: Friendliness is not a "portion" which "requires" an AI to
> be friendly to humans. Friendliness is not an add-on or a plug-in.
> Friendliness is the whole of the goal system. It is what the AI wants to
> do.

I continue not to believe that Friendliness can viably be made "the whole of
the goal system." I'll clarify this point in my systematic write-up when I
get to it. Logically, sure you CAN view any other worthy goal as a subgoal
of Friendliness, but I continue to believe this is a sufficiently awkward
way to manage other goals, that it's not a workable way for a mind to

> Point the 3rd: Friendliness is not "invariant" - a strange term to use
> for a system one of whose first and foremost recommendations is that
> supergoals should be probabilistic!

What I meant is, as the system rewrites its own code, the fact of its
Friendliness is supposed to remain unchanged. The specific content
underlying this Friendliness may of course change. Mathematically, one
might say that the class of Friendly mind-states is supposed to be an
probabilistically almost-invariant subspace of the class of all mind-states.

> Friendliness can't be ensured by creating an enslaved AI that lacks the
> capability to alter the goal system; Friendliness is ensured by creating a
> Friendly AI that doesn't *want* to stop being Friendly, just as I don't
> want to stop being a nice person.

OK, we agree there. I guess we just disagree on how to build the goal

> Point the 4th: Friendliness is not "hardwired", a term which I've seen
> you use several times.

What I mean by "hard-wiring Friendliness" is placing Friendliness at the top
of the initial goal system and making the system express all other goals as
subgoals of this. Is this not what you propose? I thought that's what you
described to me in New York...

> The main part of the model where I disagree with you is that it'll take a
> lot more than a Java supercompiler description to give a general
> intelligence humanlike understanding of source code. The Java
> supercompiler description is only the very first step.

I agree there. But I tend to think that if you put that first step together
with WM's higher-order inference engine, the second step will come all by

> What I'm saying is that *when the system reaches human intelligence*, it
> will probably be *in the middle of a hard takeoff*

And this is another point on which our intuitions differ. I think that
human-level intelligence will probably be achieved significantly **before**
a hard takeoff. I think that optimizing your own mind processes requires
human-level intelligence or maybe a little more.

We don't really disagree very profoundly; most of our disagreements are just
different intuitions about timings of things that none of us really has data
about. The most significant difference I see is as to whether, initially,
one wants to rig a goal system with Friendliness at the top....


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT