From: Eugen Leitl (firstname.lastname@example.org)
Date: Thu Jul 08 2004 - 04:55:47 MDT
This is a stream of consciousness type of comments on good features of AI codes.
There are reasons for them, but they're usually omitted here. It's a curious
mix of current good practice and future (current lunatic fringe, but, it will
make sense then, trust me).
* code for today's/next year's commodity iron, but think about
portability to massively parallel
async molecular-electronics systems with mole amounts of switches.
They're not so far away as you think. You might actually need them.
* your code should scale O(0) whether for 10^0 or 10^9 nodes (yes, it can be
done with the right topology and communication pattern) -- if it doesn't
you're doing someting really wrong. Screw Amdahl, no sequential sections
* at very large (WAN and Internet2) scale, use GRID
* use clusters, do not assume global connectivity (cubic primitive lattice is
a safe assumption to make, 3d is safe, higher dimensionality is iffy)
* long-range is an iteration of short-range. This isn't inefficient if your
time of flight is ~relativistic, and there are a couple gate delays in
between -- this is for the future.
* if you signal, try streaming jumbo packets, and do crunch while you stream
(if you can't, at least try to load all interfaces at the same time, then
crunch a block, stream, crunch, etc.)
* this is especially true if your interconnect mesh isn't unobtainium
(Infiniband & Co)
* use multiple NICs if you've got them, look at http://aggregate.org/ for
inspiration (try to hit for 6 local crossbar links for each node, though)
* use a binary protocol, specifically a minimal subset of MPI (do NOT use XML over wire
in local clusters, global networks is different)
* avoid writing to disk frequently, try to do without swap. There's a special
case if you have a large library of patterns, and can stream sequentially,
or tolerate 10 ms fetches sometimes. There won't be any disk in the future.
* try to go without local disk altogether but for checkpoints (not strictly
necessary for future nonvolatile core)
* use a spatial system composition, communicate interface state to adjacent
nodes (see cubic primitive lattice, usually 3d) -- see array
* use arrays of small data types, avoid absolute pointers (use relative
addressing as offsets to current position)
* only access memory in stream mode, try to limit unpredictable accesses to
1st/2nd level cache (here it actually pays to use assembly for prefetch,
otherwise don't bother with assembly but maybe for inner loop when you're
done. This will cut your harware budget by some 300%, ditto operation
costs. A single guy for the hardware is enough, unless yours is a two-digit
megabuck budget for iron alone.
* align objects to long and cache line boundaries
* make sure your objects could signal locally, and asynchronosly (no global clock
or clock with jitter) -- this is different from streaming over an array
with a hotspot loop. This is future, not today. Make sure your code will
survive the future.
* don't use floats, stick to integers
* if you use integers, try using short integers (4-8 bit is great)
* stick to integers and codes you can crunch in parallel with MMX/SSE* type
of SWAR SIMD
* get ready for very wide words, low latency, small cores and FPGA (think
inner loop in FPGA)
* make sure your inner loop would translate into logic some 10^3 gates simple
for each individual small integer
* put the complexity into data pattern, not code pattern. Code is not that
important, state is.
* if you think you can't do it in C, or don't need to, you're doing it wrong
* if can't describe the algorithm and the data flow on a large piece of paper,
you're doing it wrong
* on the long run, get ready for crystalline hardware (moles of switches,
local connectivity, nonvolatile state, relativistically constrained signalling)
If you can't state your problem in above framework, you're doing it the hard
way. If none of it appears relevant, and you in fact think I'm nuts then you won't
-- Eugen* Leitl leitl ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT