This is a sample of 4d EO preconditioned DWF matrix application. Only
eo part is implemented. While oe part is trivial, for better cache
reuse it is profitable to combine it into the full (1+{oe}{eo}) and
computing the norm of the result.

The present code follows the logic of Kostas's notes, but had not been
carefully checked for bugs. It runs and has correct data flow and
cache use pattern to check the approach and make a networking version
on top of QMP, though.

Files in this distribution:

README          -- this file
Makefile        -- GNU make file for building the SSE GCC version
sse.nw          -- Noweb sources of low level routines
kostas.nw       -- Noweb sourves of the DWF operator.
foo.cc          -- SSE shuffling operations test code (not needed to compile.)

sse.pdf         -- (generated) gcc sse implementation in a readable form
kostas.pdf      -- (generated) high level DWF routines in a readable form

k-main.cc       -- (generated) main driver for the timing tests
kostas.cc       -- (generated) implementation file for Lattice5d
kostas.hh       -- (generated) header file for Lattice5d and related stuff
qcd-c.hh        -- (generated) plain C++ header for vector class and operations
qcd-gcc-sse.hh  -- (generated) GCC SSE implementation of vector operations
qcd.hh          -- (generated) generic vector header

To compile the gcc sse version, say:

% make gcc.compile

