Re: "Supergoal" considered harmful

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Sat Jul 16 2005 - 20:18:02 MDT


Chris Capel wrote:
> On 7/16/05, Eliezer S. Yudkowsky <sentience@pobox.com> wrote:
>
>>The term "supergoal" is a word I used in the early days of writing "Creating
>>Friendly AI" because I was literate enough to have heard of goals and
>>subgoals, but not quite literate enough to know that I should be saying
>>"utility function".
>
> Granted all this, surely the term "supergoal" is still useful. I think
> it has a meaning distinct from "utility function", even when it's not
> misused. (Has it been misused in any recent posts? I haven't noticed.)
> Whereas the utility function is the process or algorithm by which the
> AI decides the expected utility of any given action, the supergoal is
> the overall direction that the AI's actions tends to bring the world
> to be like. In a real AI system, it's possible that the utility
> function could be coded without ever referencing any sort of
> supergoal, (indeed, bringing terminology as vague and metaphorical as
> "supergoal" into discussions about the low-level architecture of an AI
> is probably meaningless in most designs,) but "supergoal" can still be
> used to describe the real-world effects of the utility function.

No. "Decision process" is the process or algorithm by which the AI ranks
actions according to preference; or just the process that selects an action,
if they possess no well-defined preference ordering.

A "utility function" plays a particular role in a particular kind of algorithm
for ordering actions.

The role of a utility function is that it assigns real numbers to *outcomes*.
  Then a predictive mechanism associates probability distributions over
outcomes with particular actions. (Note that this predictive mechanism is
purely factual, purely testable and falsifiable, since it only predicts the
probability of an outcome conditional on an action being performed.) A real
number is then associated with the action, the action's expectation of utility
- "expectation" here having its customary mathematical sense of a weighted
average. This latter real number we might call "expected utility" and is
quite distinct from a "utility" assigned by a utility function to a fixed
outcome. The decision system then orders preferences over actions using the
standard ordering over the real numbers with respect to their associated
expected utilities. Then the decision system selects a greatest action
(usually there will be only one greatest action).

So an "expected utility maximizer" is an AI that predicts the probable
consequences of actions, assigns utilities over outcomes via a utility
function, and then chooses an action with maximal expectation of utility, that
is, a maximal weighted average of utilities over probable outcomes.

Similarly a "paperclip maximizer" is an AI that estimates the probable number
of paperclips associated with different outcomes, assigns probabilities to
outcomes given actions, and chooses an action with maximal expectation of
paperclips.

> So is this usage of supergoal valid? And since you dislike the word,
> is there a less misleading word that captures this sense distinct from
> "utility function"? It seems that most of the words we use to describe
> the same orientation in humans (things like the "purpose", "meaning",
> or "direction" of one's life) are too anthropocentric to be more
> useful. "Supergoal" certainly avoids this problem, and the
> misconception it does lead to--the idea that it makes sense to say
> this or that goal "overrides" one or another goal, or that the
> supergoal somehow represents a constraint on the behavior of the AI
> instead of a simple specification of its behavior--might be less
> pernicious than the mistakes associated with more anthropocentric
> terminology. But if you have an ever better term, I, for one, will
> certainly use it.

Should the AI be a good predictor, it will systematically steer reality into
regions which its utility function assigns high utilities. Thus, the term
"supergoal" that I used in CFAI means simply "utility function". And if the
AI is a good predictor, its utility function also serves as a good description
of the target of the optimization process, the regions of reality into which
the AI does in fact steer the future.

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT