Patterns and intelligence (was: DGI paper)

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Sun Apr 14 2002 - 21:18:36 MDT


Ben Goertzel wrote:
>
> > Because there is a difference between genericity and generality.
> >
> > Evolution, like search trees and artificial neural networks, is a fully
> > generic process. But it works much better for some things than
> > others, and
> > some things it can't handle at all. It can't do many of the things that
> > human intelligence does. You can apply a generic process to
> > anything but it
> > won't necessarily work. Usually it only solves a tiny fraction of special
> > cases of the problem (which AI projects usually go on to mistake
> > for having
> > solved the general case of the problem; this is one of the Deadly Sins).
> > Evolution uses an unreasonable amount of computational power to overcome
> > this handicap.
>
> I think that any black-box global optimization algorithm -- including
> evolution, and some NN and search-tree based algorithms -- has a kind of
> "general intelligence." The problem is that it uses too many resources.
> Human brains achieve far more general intelligence per unit of space and
> time resources than evolutionary systems.
>
> What I mean by "general intelligence" is roughly "the ability to solve a
> variety of complex problems in a variety of complex environments." As you
> know I've tried with moderate success to quantify and formalize this
> definition. I'm not sure exactly what you mean by "general intelligence",
> maybe it's something different.

Our different intuitions on this matter stem from our different definitions
of intelligence and pattern.

You (Ben) define a pattern as "a representation as something simpler". I'd
better review this for the sake of the innocent onlookers... if X is a
string of 1s and 0s, then the complexity of X, c(X), is the size of the
smallest Turing machine that produces X. Suppose you have a string Y that
is similar to X, but Y has less complexity than X - it is simpler. In this
case, according to Ben, Y is a pattern in X; it is a representation of X as
something simpler. Or rather, to be precise, the Turing machine that
produces Y as output is a pattern in the Turing machine that produces X as
output, because Y is similar to X but has less computational complexity.

Ben goes on to define several useful extensions of this principle. For
example, he defines the structure of X, St(X), as the set of all patterns in
X. He also defines a measure for the complexity of X relative to an
existing knowledge/program base K. Roughly: if you assume the Turing
machine K, how many *additional* states does it need in order to produce X
as an output? The amount of extra information needed is the complexity of X
given K, c(X|K).

I've left out a lot of stuff here, which Ben should feel free to fill in,
but I think the guts are there. According to Ben, the purpose of Novamente
is to increase the amount of pattern it contains, thus increasing the amount
of mind.

I independently arrived at broadly similar computational-theoretic
definitions, with some key differences because I was thinking about seed AI,
which may account for some of the differences in our respective intuitions
about AI. First, though, I want to point some obvious extensions that work
with either of our definitions. If pattern G decreases the complexity of a
very broad range of patterns X for c(X) vs. c(X|G), then pattern G is a
pattern that contributes to "general intelligence". Similarly, if you look
at the structure of X, St(X), and sort out the patterns in St(X) from
simpler patterns A to complex patterns Z, and then ask about c(B|A),
c(C|A+B), c(D|A+B+C), then you have a measure of the "manageability" of
St(X) - the degree to which the hard problems in X become more tractable
given the simpler problems in X. (I'm calling this "manageability" because
"tractability" is already in use; see below.) If you've already said this
somewhere, I apologize, but I didn't see it in the materials given.

My definition of pattern differs in that it is based around tractability
rather than simplicity. If A is a pattern in B, then the question is not
whether A is simpler than B, but rather whether A is, on average, less
expensive to compute. Suppose that the AI faces a context-sensitivity
problem, where some set of complex conditions X within a problem P mean that
it would be a good time to test concept C against current imagery. Suppose
that the AI or the programmer writes a codelet Q that programmatically
checks for conditions X within current imagery, and if X obtains, Q will
designate concept C as relevant. For a seed AI, the question is whether
codelet Q contributes usefully to intelligence in that repeatedly executing
this codelet against current imagery is expected to be computationally less
expensive than associating to C by other means (including blind search),
bearing in mind that X may be true for only a tiny fraction of the occasions
the codelet runs. This is the kind of problem a seed AI runs up against if
it wants to optimize itself - it's not as easy as it looks.

My measure of the distance between a string X and a string Y is not based on
"similarity" because I can think of no good way to define "similarity" in a
way that is useful for intelligence; whether something that is similar to X
is a useful approximation of X is ultimately an AI-complete problem. So,
since I've never been one for nice clean definitions, I'm assuming that X
and Y are patterned variables inside a seed AI that is trying to solve a
prediction or manipulation problem. This gives me enough of a context to
say something like "Y is usefully similar to X", in the sense that a pattern
which has a surface similarity to X but is totally useless for intelligence
will be weeded out when the current manipulation problem fails. For
example, this catches the difference between two bytecode listings A and B
that compute the Fibonacchi sequence in different ways, and a bytecode
listing C that computes the Fibonacchi sequence versus D, the same bytecode
listing with one assembly-language instruction zeroed out. The surface
similarities between C and D are very great, and yet a pattern which
produces D as an approximation of C is not contributing usefully to
intelligence, where a pattern that produces B as an approximation to A might
be.

So under my definition, Y is a pattern in X, if Y produces a useful
approximation of X, and Y is more tractable than X - that is, less expensive
computationally. From here you can build some of the same definitions that
can be constructed with Ben's building blocks; for example, the complexity
of Y given K as a base becomes the tractability of Y given K as a process
already computed. Similarly, processes or learned complexity which
contributes strongly to general intelligence are those which render a wide
range of problems more tractable. (Because DGI is a supersystem theory of
AI, it should be noted that in many cases you need to fit a lot of processes
P together into one big K before K will contribute strongly to general
intelligence.) The manageability of X is the degree to which hard problems
in X become solvable after you've solved the easy problems in X; that is,
the manageability of X is the degree to which patterns that achieve very
useful approximations of X become easier to compute and/or easier to find,
once the system contains patterns that achieve minimally useful
approximations to X.

This is why black-box processes such as neural nets and evolutionary
computations may strike Ben as "general" while striking me as "generic".
Ben's definition is based on simplicity; the black boxes are relatively
simple from the standpoint of complexity theory and hence appear to be
strong patterns. My definition is based on tractability; neural nets and
evolutionary computations may apply to any problem, but they don't make the
problem much more tractable, so I see them as weak patterns. Furthermore,
for NN or EC to have "general" intelligence would mean that taking NN or EC
as a knowledge base would make a wide range of problems much more tractable
- not just render a wide range of problems theoretically solvable given
unlimited computing power. This is why NN and EC are "generic" rather than
"general".

Another example: Who has the stronger patterns, Kasparov or Deep Blue?
Under the simplicity definition, Deep Blue's computational processes are
vastly simpler than Kasparov's, and so Deep Blue has a much stronger chess
pattern. Under the tractability definition, Kasparov and Deep Blue were
roughly evenly matched in skill when Kasparov was searching 1.8 moves per
second and Deep Blue was searching 2 billion moves per second, making
Kasparov's chess patterns enormously stronger than Deep Blue's. Deep Blue
plays the Game of Actual Chess, searching through the tree structure of
actual chess positions. Kasparov plays the Game of Regularities in Chess,
using an enormously more complex algorithm to execute a "search" that
roughly matched the gameplaying power of Deep Blue even when considering
only 1.8 positions per second - if such a measure can be said to apply at
all to Kasparov. Kasparov's chess patterns are vastly more complex than
Deep Blue's chess patterns, and they render the game far more tractable for
Kasparov than for Deep Blue.

-- -- -- -- --
Eliezer S. Yudkowsky http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT