This page summarizes information i have on HPL. Note: The HPL Tuning guide was obtained from a source that i forgotten when i archived the file. Thanks to whoever wrote it in the first place.
Introduction
Some information obtained for HPL tuning. The last 2 lines are the ENV to export for modifying the socket buffer and the memory glob size. Some other things includes N is the matrix length, and P * Q = the number of nodes. With Q always set to be > P
Sample Input file
This is the sample input of hpl on a 32 node cluster. The HPL.dat follows.
Note: For a 3 node cluster start with N=5000
-- BEGIN HPL.dat -- HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 44000 Ns 1 # of NBs 80 NBs 1 # of process grids (P x Q) 8 Ps 8 Qs 16.0 threshold 1 # of panel fact 1 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 3 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) -- END HPL.dat --
Sample HPL Output
=========================================================================== HPLinpack 1.0 -- High-Performance Linpack benchmark -- September 27, 2000 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK =========================================================================== An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 25000 NB : 80 P : 8 Q : 8 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Right BCAST : 2ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ---------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- W13R2R4 25000 80 8 8 539.04 1.933e+01 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0400536 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0098898 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0017031 ...... PASSED ============================================================================ Finished 1 tests with the following results: 1 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. ---------------------------------------------------------------------------- End of Tests. ============================================================================
MPICH Tuning for HPL
We need to configure MPICH to be able to use the maximum amount of memory available in the system.
export P4_SOCKBUFSIZE=0x40000 export P4_GLOBMEMSIZE=16777296