# Optimality of using probability

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Fri Feb 02 2007 - 21:23:54 MST

Ben Goertzel wrote:
>
> Cox's axioms and de Finetti's subjective probability approach,
> developed in the first part of the last century, give mathematical
> arguments as to why probability theory is the optimal way to reason
> under conditions of uncertainty. However, given limited computational
> resources, AI systems cannot always afford to reason optimally. It is
> thus interesting to ask how Cox's or deFinetti's ideas can be extended
> to the situation of limited computational resources. Can one show
> that, among all systems with a certain amount of resources, the most
> intelligent one will be the one whose reasoning most closely
> approximates probability theory?

I don't think a mind that evaluates probabilities *is* automatically the
best way to make use of limited computing resources. That is: if you
have limited computing resources, and you want to write a computer
program that makes the best use of those resources to solve a problem
you're facing, then only under very *rare* circumstances does it make
sense to write a program consisting of an intelligent mind that thinks
in probabilities. In fact, these rare circumstances are what define AGI
work.

If you know in advance that the task is to solve a Sudoku puzzle, then
you'll be better off writing a specialized Sudoku solver. If you know
the exact Sudoku puzzle you face, and its solution, you can write an
even more specialized program: one that just spits out that solution. A
rational use of computing power, like any rational plan, is "rational"
relative to the subjective uncertainty of the programmer about the
environment. If you already knew the exact solution, you could write
down that solution instead of writing a computer program to compute it.

If I know my program will face a problem with known statistical
structure, then I will write a program that processes probabilities
using predefined calculations. That's one circumstance under which you
would want to use a program that processes probabilities - when you see
a specific probabilistic calculation that optimizes the problem,

But what if you don't even know whether your program will encounter a
Sudoku program, or something else entirely? What if you don't know all
the environmental entities your program might interact with, or what
might be a good way to model them? Then you must somehow write a more
general program. Should this program process probabilities, even though
we don't know all the kinds of events it might discover and attach
probabilities to?

What state of subjective uncertainty must you, the programmer, be in
with respect to the environment, before coding a probability-processing
mind is a rational use of your limited computing resources? This is how
I would state the question.

Intuitively, I answer: When you, the programmer, can identify parts of
the environmental structure; but you are extremely uncertain about other
parts of the environment; and yet you do believe there's structure (the
unknown parts are not believed by you to be pure random noise). In this
case, it makes sense to write a Probabilistic Structure Identifier and
Exploiter, aka, a rational mind.

Note that I specify you must understand *part of* the structure of the
environment. You, as the programmer, have some kind of goal you are
trying to achieve by rationally using your computing power; it is
difficult to have a utility function over random noise. Your program
must *use* the unknown parts of the environmental structure to achieve
that which you started out to accomplish. You have to tie in the
discovered structures to the utility differences you care about. This
requires that you understand explicitly how your own utility function
relates to the environment, so you can reproduce that relation in a
program; and this requires that you start out with some knowledge of the

I.e: If you don't know at least some identifying characteristics of
starving African children, your state of knowledge does not let you
write a program that has feeding starving African children as a "goal".
In fact, there's no sense in which you yourself can be said to know
that starving African children exist; and no way you could identify them
as important if you saw them; and no way you could realize that
*feeding* them might increase expected utility, once you discovered the
previously unsuspected existence of food.

So that's the intuitive statement. I can't state this precisely as yet.
It's relatively simple to make a subjective probabilistic state of
uncertainty reproduce itself in an exact corresponding calculation - to
show that if you think a particular specific event is 90% probable, then
you want your computer program to represent it as 90% probable, given
that it uses probabilities at all. As for justifying a generic
probability-processing system - I won't say that it's a lot harder,
because I don't actually *know* that it's a lot harder, because I don't
know exactly how to do it, and therefore I don't know yet how hard or
easy it will be. I suspect it's more complicated than the simple case,
at least.

I tried to solve this problem in 2006, just in case it was easier than
it looked (it wasn't). I concluded that the problem required a fairly
sophisticated mind-system to carry out the reasoning that would justify
probabilities, so I was blocking on subparts of this mind-system that I
didn't know how to specify yet. Thus I put the problem on hold and
decided to come back to it later.

As a research program, the difficulty would be getting a researcher to
see that a nontrivial problem exists, and come up with some
non-totally-ad-hoc interesting solution, without their taking on a
problem so large that they can't solve it.

One decent-sized research problem would be scenarios in which you the
programmer could expect utility from a program that used probabilities,
in a state of programmer knowledge that *didn't* let you calculate those
probabilities yourself. One conceptually simple problem, that would
still be well worth a publication if no one has done it yet, would be
calculating the expected utilities of using well-known uninformative
priors in plausible problems. But the real goal would be to justify
using probability in cases of structural uncertainty. A simple case of
this more difficult problem would be calculating the expected utility of
inducting a Bayesian network with unknown latent structure, known node
behaviors (like noisy-or), known priors for network structures, and
uninformative priors for the parameters. One might in this way work up
to Boolean formulas, and maybe even some classes of arbitrary machines,
that might be in the environment. I don't think you can do a similar
calculation for Solomonoff induction, even in principle, because
Solomonoff is uncomputable and therefore ill-defined. For, say, Levin
search, it might be doable; but I would be VERY impressed if anyone
could actually pull off a calculation of expected utility.

In general, I would suggest starting with the expected utility of simple
uninformative priors, and working up to more structural forms of
uncertainty. Thus, strictly justifying more and more abstract uses of
```--