Re: Moravec's estimates?

From: James Rogers (jamesr@best.com)
Date: Mon Apr 09 2001 - 19:54:36 MDT


At 08:52 PM 4/9/2001 -0400, Brian Atkins wrote:
>The Blue Gene chips will have 16MB RAM on-die last I heard. Now that is
>going to be shared I guess by the 32 CPUs that will be on each die, so it
>isn't a lot. But it should have fantastic stream numbers. The total machine
>should have around 500GB of RAM from what I calculate.

The stream numbers will be about what you would expect on an average system
today judging by the memory architecture (which isn't particularly
remarkable speed-wise). The Blue Gene architecture is highly optimized for
extremely parallelizable and computationally intense problems with small
data sets. The 512kByte RAM per processor runs slower than the core and
the core caches are pretty small compared to most ordinary CPUs and even
many similar DSP offerings by other companies. The bus is optimized for
the architecture but isn't particularly fast; the communication between
processors is a good replacement for communication channels like Myrinet,
not for true shared memory. Basically this is a massively parallel number
cruncher, and not useful for most classes of problem (such as those that
operate on large data sets), which is not surprising since it is intended
for protein folding computation. The aggregate memory doesn't count for
much because the communication speed between the million little bits of RAM
are simply too slow. But for protein folding it kicks ass.

In short, the Blue Gene chip is a fast DSP core that has been optimized for
extreme parallel usage. If you look at the product offerings of DSP
leaders such as Texas Instrument and Analog Devices, you will find that
they are selling high performance devices today that are strikingly
similar, though IBM is trying to take parallelization to a new level. I
would venture a guess that by the time IBM finishes their Blue Gene chip,
TI will have a chip that gives very similar results (the TM320C64x series
offers 3.2-4.8 GigaOps and 8Mb per processor currently, with an expectation
that this will increase substantially over the next year or two). Analog
Devices supports massively parallel communication interconnects that are
*very* similar to what IBM is suggesting. In any case, DSPs are generally
unsuitable for general computation work on large data sets, as it is not
what they optimized for, otherwise we'd be using these wickedly fast number
crunchers on our desktops.

Cheers,

-James Rogers
  jamesr@best.com



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT