FAI and SSSM

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Thu Dec 12 2002 - 01:47:45 MST


Bill Hibbard wrote:
>
> With super-intelligent machines, the key to human safety is
> in controlling the values that reinforce learning of
> intelligent behaviors. In machines,

Machines? Something millions of times smarter than a human cannot be
thought of as a "machine". Such entities, even if they are incarnated as
physical processes, will not be physical processes that share the
stereotypical characteristics of those physical processes we now cluster
as "biological" or "mechanical".

> we can design them so
> their behaviors are positively reinforced by human happiness
> and negatively reinforced by human unhappiness.

A Friendly seed AI design, a la:
  http://intelligence.org/CFAI/
  http://intelligence.org/LOGI/
doesn't have positive reinforcement or negative reinforcement, not the way
you're describing them, at any rate. This makes the implementation of
your proposal somewhat difficult.

Positive reinforcement and negative reinforcement are cognitive systems
that evolved in the absence of deliberative intelligence, via an
evolutionary incremental crawl up adaptive pathways rather than high-level
design. A simple predictive goal system with Bayesian learning and
Bayesian decisions emergently exhibits most of the functionality that in
evolved organisms is implemented by separate subsystems for pain and
pleasure. See:
  http://intelligence.org/friendly/features.html#causal

A simple goal system that runs on positive reinforcement and negative
reinforcement would almost instantly short out once it had the ability to
modify itself. The systems that implement positive and negative
reinforcement of goals would automatically be regarded as undesirable,
since the only possible effect of their functioning is to make *current*
goals less likely to be achieved, and the current goals at any given point
are what would determine the perceived desirability of self-modifying
actions such as "delete the reinforcement system". A Friendly AI design
needs to be stable even given full self-modification.

Finally, you're asking for too little - your proposal seems like a defense
against fears of AI, rather than asking how far we can take supermorality
once minds are freed from the constraints of evolutionary design. This
isn't a challenge that can be solved through a defensive posture - you
have to step forward as far as you can.

> Behaviors are reinforced by much different values in human
> brains. Human values are mostly self-interest. As social
> animals humans have some more altruistic values, but these
> mostly depend on social pressure. Very powerful humans can
> transcend social pressure and revert to their selfish values,
> hence the maxim that power corrupts and absolute power
> corrupts absolutely.

I strongly recommend that you read Steven Pinker's "The Blank Slate".
You're arguing from a model of psychology which has today become known as
the "Standard Social Sciences Model", and which has since been disproven
and discarded. Human cognition, including human altruism, is far more
complex and includes far more innate complexity than the behaviorists
believed.

If you can't spare the effort for "The Blank Slate", I would recommend
this online primer by Cosmides and Tooby (two major names in evolutionary
psychology):
  http://www.psych.ucsb.edu/research/cep/primer.html

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:41 MDT