\documentclass{article} \usepackage{noweb} \usepackage[dvipdfm,colorlinks=true,linkcolor=cyan]{hyperref} \setlength{\parindent}{0pt} \setlength{\topmargin}{-80pt} %\setlength{\oddsidemargin}{-28pt} %\setlength{\evensidemargin}{-28pt} %\setlength{\textwidth}{546pt} \setlength{\textheight}{720pt} \title{DWF SSE Interface\\Version 1.1.1} \date{August 5, 2004} \author{Andrew Pochinsky} \begin{document} \maketitle \begin{abstract} This is a definition of the interface to the SSE DWF CG solver to a Chroma-like environment. The CG engine requires gcc version 3.3.x or higher and must be compiled as C code to achieve good performance. The interface targets both C and C++ external environments with calling conventions compatible with the gcc compiler on x86. \end{abstract} \section{NOTATION} For the following it is convenient to introduce some notation and specify restriction that the inverter imposes on its input parameters and the environment. We assume that the lattice is a 5-d torus with periodic boundary conditions in 4-directions and a domain wall in the fifth direction. Other boundary conditions in 4-directions may be implemented by appropriate modifications of the gauge field. Lattice sizes are $L_0\times L_1\times L_2\times L_3\times L_4$. The CG uses red-black preconditioning and, therefore, requires that $L_0\ldots L_3$ be even. Because of the way SSE intructions are used by the CG code, $L_4$ must be a multiple of 4. We assume also that the cluster has logical geometry $N_0\times N_1\times N_2\times N_3$ (some of $N_i$ may be 1). The cluster network is a torus in all non-trivial extends, and we require $N_i \le L_i$ for $i=0,\ldots3$. Otherwise there is no restrictions on $N_i$. (However, communications will be overlapped with computations only if $L_i/N_i\ge3$ for all $i$. Nevertheless, the code will work correctly, albeit slowly, for smaller values of $L_i/N_i$. Before embarking upon memory layout details, let us introduce \begin{eqnarray*} a_{ij}&=&\left\lfloor\frac{jL_i}{N_i}\right\rfloor,\\ b_{ij}&=&\left\lfloor\frac{(j+1)L_i}{N_i}\right\rfloor=a_{ij+1}. \end{eqnarray*} Then a node with logical coordinates $(n_0,n_1,n_2,n_3)$ hosts a sublattice with coordinates $(x_0,x_1,x_2,x_3,x_4)$, where $a_{in_i}\le x_i>= #ifndef SSE_DWF_CG #define SSE_DWF_CG <> <> <> <> #endif @ Since the interface header file may be included from C++ source, we need to tell the compiler that external symbol have C bindings: <>= #if defined (__cplusplus) extern "C" { #endif @ <>= #if defined (__cplusplus) } #endif @ \pagebreak \subsection{Opaque types} Here are opaque datatypes used by the interface: <>= typedef struct SSE_DWF_Fermion SSE_DWF_Fermion; typedef struct SSE_DWF_Gauge SSE_DWF_Gauge; @ Access to outer layer fields is done via accessor functions. Each of them takes a field to access (as \verb|void *| for writers and \verb|const void *| for readers), \emph{global} lattice coordinates, component indices, and real/imaginary part selector. In addition, there is a \verb|void *env| parameter that may be used to pass extra information to the accessor. This parameter is passed by the outer layer to export/import interface functions and is used by the CG only to give it to the call-back functions. Otherwise the CG completely ignores this argument---it does not try to read or write memory pointed to, the pointers are never stored in the internal structures etc.. <>= typedef double (*SSE_DWF_gauge_reader)(const void *OuterGauge, void *env, int lattice_addr[4], int dim, int a, int b, int re_im); @ This is the type of access functions used by the CG to read gauge field components. The CG calls \verb|gauge_reader(U, env, x, dim, a, b, 0)| to read $\Re U_{ab}(x)$. To access the imaginary part, \verb|re_im| is set to \verb|1|. Arguments \verb|a| and \verb|b| vary from \verb|0| to \verb|2| inclusive. It is guaranteed that the CG will only pass the local sublattice coordinates in \verb|lattice_addr[]|. Since this call-back is used only to setup the guage field, the upper level environment is encouraged to do out-of-range checks on \verb|lattice_addr| because it adds only small overhead while helping to catch data layout mismatch. \pagebreak <>= typedef double (*SSE_DWF_fermion_reader)(const void *OuterFermion, void *env, int lattice_addr[5], int color, int dirac, int re_im); @ This is the type of access functions used to the CG to read input fermion field components. Agrument \verb|color| varies from \verb|0| to \verb|2| inclusive, argument \verb|dirac| varies from \verb|0| to \verb|3|. Argument \verb|re_im| is \verb|0| for the real part and \verb|1| for the imaginary part. Notice that \verb|lattice_addr| has five components. <>= typedef void (*SSE_DWF_fermion_writer)(void *OuterFemrion, void *env, int lattice_addr[5], int color, int dirac, int re_im, double value); @ This is the type of writer functions used to convert back from the CGland to outer layer data format. \pagebreak \subsection{CG initialization} The first function of the CG interface called by the upper level environment must be <>= int SSE_DWF_init(const int lattice[5], SSE_DWF_FP_SIZE fp_size, void *(*allocator)(size_t size), void (deallocator)(void *)); @ Here, \verb|lattice| is size of the lattice (\emph{not the local sublattice}), \verb|allocator| is a pointer to the function the CG should use to allocate dynamic memory (if it is \verb|NULL|, standard library's \verb|malloc()| will be used.) Likewise, \verb|deallocator| is a pointer to the function to free dynamic memory (if it is \verb|NULL|, standard library's \verb|free()| will be used.) These function pointers will be stored by \verb|SSE_DWF_init()| in internal structures and may be called \emph{after} after it returns. This function does all initialization needed for the CG to run. Among other things, it allocates and initializes communication channels and constructs index tables needed for computing the Dirac operator. Argument \verb|fp_size| is \verb|SSE_DWF_FLOAT| for a single precission solver and \verb|SSE_DWF_DOUBLE| if double precission should be used. \verb|SSE_DWF_init()| will return \verb|0| in success, otherwise a non-zero value is returned. <>= typedef enum { SSE_DWF_FLOAT, SSE_DWF_DOUBLE } SSE_DWF_FP_SIZE; @ The upper level environment should complete all QMP communications before calling \verb|SSE_DWF_init()|. This includes not only data arrays involved in the inverter, but all communications in the cluster. In addition, it is expected that QMP had been initialized as outlined above. \subsection{CG cleanup} The very last CG function to be called by the upper level environment is <>= void SSE_DWF_fini(void); @ It deallocates all memory owned by the CG and returns QMP to a known state. Upon return from \verb|SSE_DWF_fini()| all CG communication operations are finished and there is no QMP channels owned by the CG. The upper level environment should wait until \verb|SSE_DWF_fini()| returns on \emph{all nodes of the cluster} before calling any QMP function. \pagebreak \subsection{Exporting gauge fields} The following function is used to convert outer layer gauge field into a format suitable for the CG. For simplification of the non-critical parts of the CG we require two gauge field parameters: assuming that \verb|U[mu]| is the gauge field in the canonical form (link in the \verb|mu| direction at each lattice site,) let \verb|V[mu]| be its cyclic shift, namely \verb|V[i] = cshift(U[i], i, UP)|. In these conventions, the prototypez of the gauge field loaders are <>= SSE_DWF_Gauge *SSE_DWF_load_gauge(const void *OuterGauge_U, const void *OuterGauge_V, void *env, SSE_DWF_gauge_reader reader); @ While in the loader, \verb|reader| will be called to access the outer layer data. On return, \verb|NULL| indicates that the load operation failed. Otherwise, the returned value is suitable for \verb|SSE_DWF_solve()|. Gauge fields loaded into the CG should be freed by calling the following function: <>= void SSE_DWF_delete_gauge(SSE_DWF_Gauge *); @ \subsection{Exporting fermion fields} For domain wall fermions, let us start with a function used to load the right hand side and the initial guess of the Dirac equation. One does the conversion by the following function: <>= SSE_DWF_Fermion *SSE_DWF_load_fermion(const void *OuterFermion, void *env, SSE_DWF_fermion_reader reader); @ This function allocates and initializes 5-d fermion fields that are suitable as arguments for the solver proper. To allocate an uninitialized fermion field for the CG, one can use the following function: <>= SSE_DWF_Fermion *SSE_DWF_allocate_fermion(void); @ Either allocated or loaded, CG's fermion fields should be freed after use to reclaim memory by calling <>= void SSE_DWF_delete_fermion(SSE_DWF_Fermion *); @ \subsection{Importing the result} We also need a way to convert solutions of the domain wall Dirac equation to the upper level format. Here are functions to do that: <>= void SSE_DWF_save_fermion(void *OuterFermion, void *env, SSE_DWF_fermion_writer writer, SSE_DWF_Fermion *CGfermion); @ It will iterate through the local subvolume on each node and call \verb|writer()| with approriate arguments to convert data into the outer layer format. \subsection{Solver engine} The solver proper takes fields converted into CG's format and a few extra parameters: <>= int SSE_DWF_cg_solver(SSE_DWF_Fermion *result, double *out_eps, int *out_iter, const SSE_DWF_Gauge *gauge, double M_0, double m_f, const SSE_DWF_Fermion *guess, const SSE_DWF_Fermion *rhs, double eps, int max_iter); @ It returns \verb|0| if it believes that a reasonable approximation to the solution was found and a non-zero value otherwise. Number of conjugate gradient iterations used is returned in \verb|out_iter|, an estimate of the residue after the last iteration is returned in \verb|out_eps|. It uses the operator and preconditioner described in \verb|dwf.pdf|. \pagebreak \section{SAMPLE USAGE PSEUDOCODE} Here is a pseudo-code showing a possible use of the CG by the upper level environment. It is possible to use the CG interface in different ways, e.g., to solve many equations with the same gauge field without going through the full initialization dance. The changes needed to accomplish shat should be obvious to the reader by now. \begin{verbatim} OuterSolver(U, eta, guess) { OuterGauge V; OuterFermion solution; for (int i = 0; i < 4; i++) V[i] = cshift(U[i], i, UP); // Finilalize all outer layer QMP operations SSE_DWF_init(lattice, SSE_DWF_FLOAT, NULL, NULL); // single preciession SSE_DWF_Gauge *g = SSE_DWF_load_gauge(U, V, NULL, gauge_reader); SSE_DWF_Fermion *rhs = SSE_DWF_load_fermion(eta, NULL, fermion_reader); SSE_DWF_Fermion *x0 = SSE_DWF_load_fermion(guess, NULL, fermion_reader); SSE_DWF_Fermion *x = SSE_DWF_allocate_fermion(); double out_epsilon; int out_iterations; SSE_DWF_gc_solver(x, &out_epsilon, &out_iterations, g, m0, M, x0, rhs, 1e-14, 5000); SSE_DWF_save_fermion(solution, NULL, fermion_writer); SSE_DWF_delete_gauge(g); SSE_DWF_delete_fermion(rhs); SSE_DWF_delete_fermion(x0); SSE_DWF_delete_fermion(x); SSE_DWF_fini(); return solution; } \end{verbatim} \pagebreak \section{CHUNKS} \nowebchunks \end{document}