Self-modifying FAI (was: How hard a Singularity?)

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Jun 26 2002 - 05:29:14 MDT


Stephen Reed wrote:
> On Tue, 25 Jun 2002, Ben Goertzel wrote:
>
>>The hard part is: If one creates a system that is able to change its concept
>>of Friendliness over time, and is able to change the way it governs its
>>behavior based on "goals" over time, then how does one guarantee (with high
>>probability) that Friendliness (in the designer's sense) persists through
>>these changes.
>
> I understand from CFAI that one grounds the concept of Friendliness in
> external referents - that the Seed AI attempts to model with increasing
> fidelity. So the evolving Seed AI becomes more friendly as it reads more,
> experiments more and discovers more about what friendliness actually is.
> For Cyc, friendliness would not be an implementation term (e.g. some piece
> of code that can be replaced), but be a rich symbolic representation of
> something in the real world to be sensed directly or indirectly.
>
> So I regard the issue as one of properly educating the Seed AI as to what
> constitutes unfriendly behavior and why not to do it - via external
> referents.

I agree with your response to Ben. We don't expect an AI's belief that the
sky is blue to drift over successive rounds of self-modification. Beliefs
with an external referent should not "drift" under self-modification except
insofar as they "drift" into correspondence with reality. Write a
definition of Friendliness made up of references to things which exist
outside the AI, and the content has no reason to "drift". If content drifts
it will begin making incorrect predictions and will be corrected by further
learning.

Furthermore, programmers are physical objects and the intentions of
programmers are real properties of those physical objects. "The intention
that was in the mind of the programmer when writing this line of code" is a
real, external referent; a human can understand it, and an AI that models
causal systems and other agents should be able to understand it as well.
Not just the image of Friendliness itself, but the entire philosophical
model underlying the goal system, can be defined in terms of things that
exist outside the AI and are subject to discovery.

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT