RE: Review of Novamente

From: Ben Goertzel (ben@goertzel.org)
Date: Sat May 11 2002 - 09:24:40 MDT


> > In bio & finance we got really awesome results in terms of being able to
> > recognized fancier patterns than anyone else.
>
> What results? Why are they awesome? According to my
> already-given estimate
> of the system, Novamente does have a genuine AI capability in the
> domain of
> pattern recognition and may be able to achieve a genuine AI capability in
> solving some goal-oriented problems in the patterns it can recognize, so
> recognizing fancier patterns than anyone else is exactly what I *think*
> Novamente ought to be able to do, but it would still help to have
> a specific
> example of a novel pattern that Novamente recognized. I could, after all,
> be too optimistic.

Here is a simple example of a pattern recognized by the system quite
recently, using some *very* simple methods compared to the full Novamente
capability...

*******

LOW, MOD_LOW, EXTR_HIGH, DECREASE, INCREASE are predicates applied to
numbers w/respect to a given set of numbers (establishing a Context). Here
the context is a dataset of gene expression values regarding the yeast cell
cycle.

SIC1, PCL2, CLN3, SWI5 are shorthands for gene expression values.

For instance, SIC1 is a shorthand for
expression(GeneNode SIC1) which is a function from time-values to numbers

The variable D is a category learned from the Stanford Genome Database

A relatively simple pattern in a yeast gene expression dataset, with a
fairly clear human meaning, is then:

----
C = ( LOW(SIC1) OR MOD_LOW(SIC1) ) AND ( LOW(PCL2) ) AND (LOW(CLN3)) OR
(MOD_LOW(CLN3) )
D = "involved in transcriptional regulation of CUP1"
C AND EXTR_HIGH( SWI5) -->
DECREASE(SWI5) AND INCREASE(D)
C AND (MOD_HIGH(SWI5) OR HIGH(SWI5)) -->
INCREASE(SWI5) AND INCREASE(D)
----
**********
I've omitted the quantitative truth values of the patterns...
Many patterns found are more complicated by far, but in this application
simple patterns are favored because the patterns are intended to be viewed
by human biologists so they can stimulate the human mind to further
hypothesis.
No one told the system to look at these particular 4 genes (there are 6100
yeast genes), or to look at transcriptional regulation as opposed to
hundreds of other properties of genes in databases the system was given....
We did tell the system to look for patterns involving combinations of
certain predicates though.
This is classic "data mining" -- the system is thrown a bunch of data and
looks for interesting patterns.  It happens to be a datamining problem that
has proved unsolvable for traditional datamining methods however, in spite
of several years' effort.  (Only several years because gene expression data
only recently became available due to the recent invention of gene chips &
spotted microarrays.)
It's far from AGI of course ... the system is not posing its own problems
based on its own experience, it's being given a particular data analysis
problem by a script and then spitting out answers after hours of processing.
It does involve two levels of analysis, a breakdown that's sort of
interesting from a cognitive perspective.  level 1 is finding some part of
the perceptual data that's worth paying attention to (e.g. a "coherent" set
of genes), Level 2 is finding the best possible patterns in this
coherent/attention-worthy part.
Of course I am leaving out all the details of how this app was set up, which
involves loads of tricks.  There is a paper to be submitted to Journal of
Computational Biology that tells more, but it specifically doesn't go into
the details of Novamente AI, instead trying to explain the
application-specific behavior of Novamente in more conventional terms (so as
to be able to explain what's going on in the context of an isolated research
paper).
Previous approaches to the gene expression data analysis problem are much
simpler and less useful, involving either clustering, or recognition of
simple pairwise patterns between genes, or linear time dependencies.  These
previous approaches do not reveal much of the genetic regulatory structure
underlying the dataset, but Novamente-based analysis does.
Again, please do not accuse me of confusing this work with AGI.  Of course,
it's "just" datamining.  We find that doing this sort of application, in
addition to its intrinsic value (better understanding of gene interactions
helps lead to disease cures etc.), helps us to tune the various mechanisms
involved.  There is a risk that tuning a cognitive mechanism for datamining
is actually tuning it to perform BADLY on more closely AGI-related tasks,
but we have a theoretical perspective that at least helps us to ward off
this problem.
Also, please don't think that all our work is narrowly focused on such
datamining apps.  Most of it now is focused on building basic stuff that's
useful both for datamining and for AGI, but there's also some work going on
that is not useful for datamining in the short term, only for AGI.
Finally, this particular R&D is at an early stage.  In 6 months we should
have a *really solid* system for gene expression data analysis, we're just
beginning with this application area at the moment.
-- Ben


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT