HPL (High Performance Linpack): Benchmarking Raspberry PIs

Benchmarking is the process of running some of the standard programs to evaluate the speed achieved by a system. There are a number of standard bechmarking programs and in this tutorial we benchmark the Linux system using a well known program called the HPL, also known as High Performance Linpack.

Introduction

In this tutorial we cover how to go about benchmarking a single processor system, the Raspberry Pi. First we will benchmark a single node, and then continue to benchmark multiple nodes, each node representing a Raspberry Pi. There are a few things to be noted here. Firstly, benchmarking a single node or multiple nodes has a few dependencies to be satisfied which will be covered in this tutorial. BUT, on multiple nodes there are even more dependencies like the MPI implementation (like MPICH or OpenMPI) has to be built and running for the HPL to work. So for benchmarking multiple nodes, I assume that your nodes have MPICH installed and running.

What is HPL?

HPL is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. The HPL package provides a testing and timing program to quantify the accuracy of the obtained solution as well as the time it took to compute it. The best performance achievable by this software on your system depends on a large variety of factors. This implementation is scalable in the sense that their parallel efficiency is maintained constant with respect to the per processor memory usage. Thus we can use this to benchmark a single processor or a series of distributed processors in parallel. So lets begin installing HPL.

1 Installing dependencies

HPL has a few software dependencies that have to be satisfied before it can be installed. They are:

  • gfortran - fortran program compiler
  • MPICH2 - an implementation of MPI
  • mpich2-dev - development tools
  • BLAS - Basic Linear Algebra Subprograms

Here we assume that you have MPICH2 installed. To install other dependencies and packages, use the following command:

sudo apt-get install libatlas-base-dev libmpich2-dev gfortran

Only this step has to be repeated in each of the nodes (Pis) present in the cluster.

2 Download HPL and set it up

Download the HPL package from here. The next thing to do is extract the tar file and create a makefile based on the given template. Open the terminal and change the directory to where the downloaded HPL tar file is stored. Execute the following set of commands one after another.

tar xf hpl-2.1.tar.gz
cd hpl-2.1/setup
sh make_generic
cd ..
cp setup/Make.UNKNOWN Make.rpi

The last command copies the contents of Make.UNKNOWN to Make.rpi . We do this is because, the make file contains all the configuration details of the system ( The raspberry pi) and also the details various libraries such as mpich2, atlas/blas packages, home directory, etc. In the next step, we make changes to the Make.rpi file.

3 Adjust the Make.rpi file

This is an important step. Changes shown below vary according to your system. Here I show it with respect to my system. Please note that the following changes have parameters shown which are spread throughout the Make.rpi file. So I suggest you to find each parameter and replace or add the changes and only then continue to the next parameter.

Open the Make.rpi file using a text editor using the command:

nano Make.rpi

Make the following changes to the file.

ARCH         = rpi
TOPdir       = $(HOME)/hpl-2.1
MPdir        = /usr/local/mpich2
MPinc        = -I $(MPdir)/include
MPlib        = $(MPdir)/lib/libmpich.a
LAdir        = /usr/lib/atlas-base/
LAlib        = $(LAdir)/libf77blas.a $(LAdir)/libatlas.a

4 Compiling the HPL

Once the Make file is ready, we can start with the compilation of the HPL. The ".xhpl" file will be present in the "bin/rpi" folder within the HPL folder. Run the following command:

makeh arch=rpi

5 Creating the HPL input file

The following is an example of the "HPL.dat" file. This is the input file for HPL when it is run. The values provided in this file is used to generate and compute the problem. You can use this file directly to run tests for a single node. Create a file within the "bin/rpi" folder and name it "HPL.dat". copy the contents below into that file.

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
5040         Ns
1            # of NBs
128          NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

The contents of this file has to be varied by trial and error method, till one gets an output that is satisfactory. To know about each of the parameter and how to change it refer to a paper here. To skip to the main point, start reading from Page no. 6 in that document.

6 Running HPL on single node

Once the HPL.dat file is ready, we can run the HPL. The HPL.dat file above is for a single node or processor. The product of the P*Q values in the above file give the number of processors the HPL is being tested for. Thus from the above file P=1 and Q=1 , 1*1=1, so it is for a single processor. Now to run it use the commands:

cd bin/rpi
./xhpl

The output looks something similar to what is shown below:

================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   5040 
NB     :     128 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :   Right 
NBMIN  :       4 
NDIV   :       2 
RFACT  :   Crout 
BCAST  :  1ringM 
DEPTH  :       1 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

Also, we have to concentrate on the final result. The final output that comes on the terminal will look similar as shown below. The last value gives the speed and the values before that show the different parameters provided. In the below content ,the speed is shown in Gflops and its value is around 1.21e-01 Gflops , which when converted gives 121 Mega FLOPS (MFLOPS).

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       21400   128     3    11              537.10              1.210e-01
HPL_pdgesv() start time Mon Jun 23 17:29:42 2014

HPL_pdgesv() end time   Mon Jun 23 17:55:19 2014

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0020152 ...... PASSED
================================================================================

Please note that depending your Raspberry Pi the speed and the time taken might be significantly different. So please do not use these results as a comparison to your node or cluster.

7 Running HPL on multiple nodes

When we want to run HPL for multiple nodes, we will have to change the HPL.dat file. Here lets assume that we have 32 nodes. So the product of P*Q should be 32. I chose P=4 , Q=8 thus 4*8=32. So apart from this change, we will have to change value of N, from trial and error, we got the maximum speed for N=17400. The final file content is shown below. make those changes accordingly in your "HPL.dat" file.

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
17400         Ns
1            # of NBs
128          NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
4            Ps
8            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

Once this is done we will have to run the HPL again. Use the following command. Remember to change the path in the command below to represent the path of machine file in your system.

cd bin/rpi
mpiexec -f ~/mpi_testing/machinefile -n 32 ./xhpl

The result of this will be similar to as shown above for one node, but it will definitely have a higher speed.

This kind of changes can be done depending on the number of nodes or processors in the system and the benchmark results can be found out. And as I mentioned earlier, to know more about how to set the values in the HPL.dat file, head over to the document here and give it a read.

Share this page:

9 Comment(s)

Add comment

Comments

From: puneet

I followed your instructions; uset your HPL.dat file;;i am getting following error:HPL ERROR from process # 0, on line 355 of function HPL_pdinfo:>>> Number of values of NB is less than 1 or greater than 20 <<<HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:>>> Illegal input in file HPL.dat. Exiting ... <<<====================================================================================   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES=   PID 5686 RUNNING AT cdac-Lenovo-B590=   EXIT CODE: 1=   CLEANING UP REMAINING PROCESSES=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES=================================================

From: Lanze

I have been following your instruction and I am stuck on the 4th step. I have no bin/rpi and makeh gives me "command not found". Please help me if you can. My grade depends on this working.

Background: cluster of 4 RPI 2 Bs, trying to benchmark them with Linpack before I move on to my own testing. I should have all the required software installed

From: carma03

I installed libmpich2-dev and I get the error "gfortran: error: /usr/local/mpich2/lib/libmpich.a: No such file or directory" when execute make arch=rpi command. ¿What's wrong?

From: Burcheso

Thank you so much for the info. It was very usefull to me. I am a Bachelors student from Spain and I started a project from this manual.

PS: to Lanze, step 4 is a mistake. Just try "make arch=rpi".

From: wagakki

Hello sir, do you have link or tutorial how to install mpich2, because i have followed your step and not found in my directory

/usr/local/mpich2 and i am stuck in step 3.

From: Rive

A better, easier way to install and run linpack bench on the pi (pi3). Be sure to reboot after install and before running.

 

sudo apt-get install libmpich-dev wget http://web.eece.maine.edu/~vweaver/junk/pi3_hpl.tar.gz tar -xvzf pi3_hpl.tar.gz chmod +x xhpl run: ./xhplStock Pi 3 6.1 Gflops

[email protected]:~ $ ./xhpl

================================================================================

HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012

Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK

Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK

Modified by Julien Langou, University of Colorado Denver

================================================================================

 

An explanation of the input/output parameters follows:

T/V    : Wall time / encoded variant.

N      : The order of the coefficient matrix A.

NB     : The partitioning blocking factor.

P      : The number of process rows.

Q      : The number of process columns.

Time   : Time in seconds to solve the linear system.

Gflops : Rate of execution for solving the linear system.

 

The following parameter values will be used:

 

N      :    8000

NB     :     256

PMAP   : Row-major process mapping

P      :       1

Q      :       1

PFACT  :    Left

NBMIN  :       2

NDIV   :       2

RFACT  :   Right

BCAST  :   2ring

DEPTH  :       0

SWAP   : Mix (threshold = 64)

L1     : transposed form

U      : transposed form

EQUIL  : yes

ALIGN  : 8 double precision words

 

--------------------------------------------------------------------------------

 

- The matrix A is randomly generated for each test.

- The following scaled residual check will be computed:

      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )

- The relative machine precision (eps) is taken to be               1.110223e-16

- Computational tests pass if scaled residuals are less than                16.0

 

================================================================================

T/V                N    NB     P     Q               Time                 Gflops

--------------------------------------------------------------------------------

WR02R2L2        8000   256     1     1              55.37              6.166e+00

HPL_pdgesv() start time Sat Apr 23 15:14:17 2016

 

HPL_pdgesv() end time   Sat Apr 23 15:15:12 2016

 

--------------------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0025941 ...... PASSED

================================================================================

 

Finished      1 tests with the following results:

              1 tests completed and passed residual checks,

              0 tests completed and failed residual checks,

              0 tests skipped because of illegal input values.

--------------------------------------------------------------------------------

 

End of Tests.

From: rive

Is there a formatting issue? (I didn't post it the way it is shown above), here is the raspberry pi  linpack instructions again:

 

sudo apt-get install libmpich-dev

wget http://web.eece.maine.edu/~vweaver/junk/pi3_hpl.tar.gz

tar -xvzf pi3_hpl.tar.gz

chmod +x xhpl

./xhpl

From: shareef

Hi, i am in step 4, when i execute make arch=rpi 

the file starts executing but stops with the following error

"Makefile:47: Make.inc: No such file or directory

make[2] : *** No rule to make target 'Make.inc'. Stop.

make[2] : Leaving directory '/home/pi/Desktop/hpl-2.1/src/auxil/rpi'

Make.top:54: recipe for target 'build_src' failed

make[1] : *** [build_src] Error 2

make[1] : Leaving directory '/home/pi/Desktop/hpl-2.1'

Makefile:72: recipe for target 'build' failed

make: *** [build] Error 2

Make.top:54: recipe for "

 

Can someone pls help.

From: shareef

Hi, i am in step 4, when i execute make arch=rpi 

the file starts executing but stops with the following error

"Makefile:47: Make.inc: No such file or directory

make[2] : *** No rule to make target 'Make.inc'. Stop.

make[2] : Leaving directory '/home/pi/Desktop/hpl-2.1/src/auxil/rpi'

Make.top:54: recipe for target 'build_src' failed

make[1] : *** [build_src] Error 2

make[1] : Leaving directory '/home/pi/Desktop/hpl-2.1'

Makefile:72: recipe for target 'build' failed

make: *** [build] Error 2

Make.top:54: recipe for "

 

Can someone pls help.