Re: supergoal stability

From: Wei Dai (
Date: Sat May 04 2002 - 01:51:06 MDT

On Fri, May 03, 2002 at 06:38:46PM -0400, Eliezer S. Yudkowsky wrote:
> It currently looks to me like any mind-in-general that is *not* Friendly
> will automatically resist all modifications of the goal system, to the limit
> of its ability to detect modifications.

Thanks, that does make a great deal of sense. I thought that the
difficulty with creating an AI/SI that would implement goals as originally
understood by the programmers was that once the AI became sufficiently
intelligent, it would somehow decide that the goals are too trivial and
not worthy of its attention. But I guess there is really no reason for
that to happen, and the danger is actually in the earlier less intelligent
stages, where it may make mistakes in deciding whether a candidate
self-modification is overall a positive or negative contribution to its

> The inventor of CFAI won't even tell you the reasons why this would be
> difficult, just that it is.

Why not? If someone was to naively try to use the CFAI approach to create
an AI that serves some goal other than Friendliness, what is the likely
outcome? Would it be catastrophic or just fruitless?

> Well, today I would say it differently: Today I would say that you have to
> do a "port" rather than a "copy and paste", and that an AI can be *more*
> stable under changes of cognitive architecture or drastic power imbalances
> than a human would be, unless the human had the will and the knowledge to
> make those cognitive changes that would be required to match a Friendly AI
> in this area.

Since you're planning to port your own personal philosophy to the AI, do
you have a document that explains in detail what your personal philosophy
is? I'm particularly interested in the following question. If two groups
of people want access to the same resource for incompatible purposes, and
no alternatives are available, how would you decide which group to grant
the resource to? In other words, what philosophical principles will guide
the Sysop in designing its equivalent of the CPU scheduling algorithm?

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT