High Performance Linpack

This page summarizes information i have on HPLNote: The HPL Tuning guide was obtained from a source that i forgotten when i archived the file. Thanks to whoever wrote it in the first place.


Some information obtained for HPL tuning. The last 2 lines are the ENV to export for modifying the socket buffer and the memory glob size. Some other things includes N is the matrix length, and P * Q = the number of nodes. With Q always set to be > P

Sample Input file

This is the sample input of hpl on a 32 node cluster. The HPL.dat follows.

Note: For a 3 node cluster start with N=5000

-- BEGIN HPL.dat --
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
44000        Ns
1            # of NBs
80          NBs
1            # of process grids (P x Q)
8            Ps
8            Qs
16.0        threshold
1            # of panel fact
1            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
2            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
3            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64          swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
-- END HPL.dat --

Sample HPL Output

HPLinpack 1.0  --  High-Performance Linpack benchmark  --  September 27, 2000
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Labs.,  UTK

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB    : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time  : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :  25000
NB    :      80
P      :      8
Q      :      8
PFACT  :  Right
NBMIN  :      4
NDIV  :      2
RFACT  :  Right
BCAST  :  2ringM
DEPTH  :      1
SWAP  : Mix (threshold = 64)
L1    : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words


- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
  1) ||Ax-b||_oo / ( eps * ||A||_1  * N        )
  2) ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  )
  3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be          1.110223e-16
- Computational tests pass if scaled residuals are less than          16.0

T/V                N    NB    P    Q              Time            Gflops
W13R2R4        25000    80    8    8            539.04          1.933e+01
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0400536 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0098898 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0017031 ...... PASSED

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.

End of Tests.

MPICH Tuning for HPL

We need to configure MPICH to be able to use the maximum amount of memory available in the system.

export P4_SOCKBUFSIZE=0x40000
export P4_GLOBMEMSIZE=16777296


HPL Paper