**From:** Ben Goertzel (*ben@goertzel.org*)

**Date:** Fri Aug 30 2002 - 09:43:38 MDT

**Next message:**Ben Goertzel: "RE: Metarationality (was: JOIN: Alden Streeter)"**Previous message:**Christian L.: "Autistic savants (was: Metarationality)"**In reply to:**Christian L.: "RE: Bayesian Pop Quiz"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Christian wrote:

*> Bayes' Theorem supposes that we have a universal set U which is
*

*> subdivided
*

*> into disjunct subsets H_1, ..., H_n. Then, given an event A,
*

*> the probability of H_i when A has happened, P(H_i | A), can be
*

*> calculated as
*

*>
*

*> P(H_i | A) = P(H_i)*P(A | H_i) / (\sum_j P(H_j)*P(A | H_j))
*

*>
*

*> What I have heard, the controversy that sometimes arises out of
*

*> the use of
*

*> this theorem is due to the fact that the probabilities P(H_j) are
*

*> often very
*

*> difficult to calculate, so you can distort your data by setting the
*

*> probabilities P(H_j) in a sloppy fashion.
*

*>
*

*> Am I correct in saying that the different Bayesian philosophies are
*

*> concerned with methods of setting these probabilities (are these the
*

*> "priors" you discuss?) in a careful way? Or is this too simplistic?
*

From

http://ic.arc.nasa.gov/ic/projects/bayes-group/html/bayes-theorem-long.html

"Bayes' theorem gives the rule for updating belief in a Hypothesis H (i.e.

the probability of H) given additional evidence E, and background

information (context) I:

p(H|E,I) = p(H|I)*p(E|H,I)/p(E|I) [Bayes Rule]

The left-hand term, p(H|E,I), is called the posterior probability, and it

gives the probability of the hypothesis H after considering the effect of

evidence E in context I. The p(H|I) term is just the prior probability of H

given I alone; that is, the belief in H before the evidence E is considered.

The term p(E|H,I) is called the likelihood, and it gives the probability of

the evidence assuming the hypothesis H and background information I is true.

The last term, 1/p(E|I), is independent of H, and can be regarded as a

normalizing or scaling constant. The information I is a conjunction of (at

least) all of the other statements relevant to determining p(H|I) and

p(E|I)."

So, yeah, it's often the setting of the priors P(H_i) [in your multivariate

example] that is controversial. MaxEnt is one way of doing this.

Choice of Bayesian versus parametric stats methods often comes down to a

matter of taste: does one make heuristic assumptions about priors (MaxEnt,

invariance-principle-based assumptions, etc.), or does one make a heuristic

assumption regarding what pdf one is dealing with (Gaussian, hypergeometric,

whatever...).

Another controversial point is the making conditional independence

assumtpions, for instance it's handy to simplify

p(H|E1,E2,E3,I) = p(H|I)*p(E1,E2,E3|H,I)/p(E1,E2,E3|I)

p(H|I)*p(E1|H,I)*p(E2|E1,H,I)*p(E3|E2,E1,H,I)

= ---------------------------------------------

p(E1|I)*p(E2|E1,I)*p(E3|E2,E1,I)

by assuming

p(E2|E1,I) = p(E2|I) and p(E1|E2,I) = p(E1|I).

but it's not always correct...

This leads one into Bayesian networks, a popular AI technique in which one

constructs a directed acyclic graph (dag) of events, so that any two events

in the graph are independent conditional on their ancestors in the graph.

See

http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html

for basic Bayes nets info.

Cyc, for example, uses Bayes nets ideas to make parts of their knowledge

base probabilistic.

A problem with Bayes nets is that real knowledge bases often aren't easily

decomposable into dag hierarchies. Thus, there have arisen things like

"loopy Bayes nets." My own AI system Novamente uses a variant of prob.

inference called Probabilistic Term Logic (PTL), which we created ourselves,

and which is vaguely along the lines of loopy Bayes nets, but fits better

into an integrative AI framework.

Specifically, whereas Bayes nets (even loopy ones) assume all inference

occur within a single universal set U, PTL allows for a distributed network

of inferences, each of which may occur within a different U. So it doesn't

assume a consistent probability model, rather a family of overlapping

probability models.

In Novamente, some probabilities are detected by "direct evaluation of

evidence" (which includes the results of some nonprobabilistic cognitive

methods). Then other probabilities are extrapolated from these using

probability theoretic rules (which incorporate Bayes rule among other

algebraic identities...). The "nonprobabilistic cognitive methods", from a

Bayesian perspective, could be interpreted as setting prior probabilities.

This is not how we usually think about the system's operations though....

We usually think as though there are a family of cognitive, perceptual and

action processes going on in the system, cooperating in revising the same

pool of procedural and declarative knowledge, and explicitly probabilistic

methods are just one member of the family. Eliezer points out that all the

members of the family can in principle be viewed in probabilistic terms, and

it's true, but I don't find this observation all that useful.

-- Ben G

**Next message:**Ben Goertzel: "RE: Metarationality (was: JOIN: Alden Streeter)"**Previous message:**Christian L.: "Autistic savants (was: Metarationality)"**In reply to:**Christian L.: "RE: Bayesian Pop Quiz"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

*
This archive was generated by hypermail 2.1.5
: Wed Jul 17 2013 - 04:00:40 MDT
*