An HPL. Benchmarking procedure that works for me.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

An HPL. Benchmarking procedure that works for me.

Davids
       

Hi  
Pretty  good  news is that  Pelicanhpc  already  has the HPL   Linpack file in the user  folder.
1. All  we need to is locate  using the linux basics using this command: ( for those who may be newbies  to linux) :
     cd   /home/user/hpl-2.0
2. At  this point, we  are already  in the HPL directory, so  we perform the compilation using the following command:
sh  SetupForPelican.    

Make sure  you  maintain the  case sensitive of the command.

3. After    Compilation we  change  to this directory using the following command:
         cd   ./bin/Pelican

4. Then we can now run our  XHPL to test the performance of our cluster, using the following command:
           mpirun  --hostfile   /home/user/tmp/bhosts   -np  X  xhpl

Note:
X  value depends  on the  number of nodes  and  the proposed   processor grid size  of your cluster.

5.   When  I used the  HPL.dat file  without   tuning  it, and run it on  my cluster   of 5 nodes , as  below:
                     mpirun   --hostfile   /home/user/tmp/bhosts    -np  5  xhpl

I   got the following results  :   2.048 Gfloos.   And an average  timing  of  8.80  Seconds.
6. Any one would want   to improve his performance ,   so  we will  have  to tune our cluster,  this will require to have some prior knowledge  of the basics  of HPL  Input file  understanding.


For  tuning  purposes , we   should focus  our  tuning  on the  3  major  areas:
1. N , the  problem  size.
2. NB-    Block size.
3. Process  Grid Ratio = (P X Q).

7. Therefore  I go ahead  to access the HPL.dat   file by going back to  HPL  folder  as below using the following  command:
                      cd   /home/user/hpl-2.0
           then  access   the  HPL.dat   file using  any editor of your choice, for my case I used the  vi editor:
                         vi  HPL.dat
8. At this point you will have the HPL.dat   input file, then go through it,   Considering the  3 parameters  we can start  tuning the Processor  Grid (PXQ) , depending   on your hardware Specifications.  
Mine  am using  a cluster  of 1 Frontend  node  and 4 Compute nodes  , all  Duo core  each with 2GB   Physical  Memory. (RAM), 2.33GHZ , Intel

9. Since  I have 10 cores =  5 *2   Cores   available,  I could want to use only the compute nodes for the computations,   therefore I will use   the  8 CORES.
Possible   Configurations:
P   *  Q  =  2  *  4                          #  Processor Grid size =   1.
P   *  Q   =  1 *  3   &  1 * 5          #   Processor   Grid size =  2.
P   *  Q     = 1 * 4   &  1*  4           #   Processor   Grid size =  2.
P  *  Q     =  2 * 3   &  1* 2         #   Processor   Grid size =  2.
NOTE;
Whatever  way u  tune the grid, it should  be not more than the total  number of cores available on your  cluster
this value becomes the new X for example i my case X = 8 Cores.

There  so many ways you can tackle your tuning, for example for me  I chose to narrow down my experimenting to   5 scenarios:
1. Fixed  Processor Grid, Fixed Block size and  Varying  Problem size N.
2. Fixed  Processor Grid, Fixed  Problem size  and Varying Block size.
3.  Fixed  Problem  size, Fixed Block size  and  Varying the Processor  grid.
4. Fixed Problem size, Varying the Block size  and Varying the Processor  grid.
5. Fixed  Block size, Varying the Problem size  and Varying the Processor  grid.

On starting with scenario  1  ;
1. When I have  N = 3000,  NB = 128, P * Q = 2 * 4,   I  have  2.048  Gflops
2. When  I have  N = 5000, NB =  128, P*Q = 2* 4 ,    I have  3.112   Gflops.
3. When I have  N  =  10000, NB= 128, P*Q = 2*4, I have  5.362 Gflops.

 Part 2 compared  to part 1 indicates that  there is an 51.953%  increase,  part 3 compared  to part 2 indicates  a 72.301%  increase  in performance.
And so on, so fourth you can continue to carry out your tests according to your choice.
Hope this would be helpful.
 
                   
       








Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: An HPL. Benchmarking procedure that works for me.

Michael Creel
Administrator
Thanks for the summary. The issue of HPL tuning has come up a few times. For the released images I don't really put any effort into tuning, because appropriate tuning depends upon the hardware that will be used. For people interested in geting good numers from HPL,  there is quite a bit of information on the net for this. There's even a page that will generate the .dat file for you: http://www.advancedclustering.com/faq/how-do-i-tune-my-hpldat-file.html I'm not sure how well that works, thoug.
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: An HPL. Benchmarking procedure that works for me.

Davids
Hi

i have been exhaustively testing my cluster overall system performance using the hpl linpack, now i want to further narrow down my benchmarking to the subsystem level, with concentration on
1. General network performance   2. Communication latency   3. Communication bandwidth.

i guess it could involve me having to do some fine tuning of the openmpi and other neccessary activities.

thou am kinda confused on how i should get started and what i should really concentrate on to achieve my targets, i request for some brief guidelines and hints.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: An HPL. Benchmarking procedure that works for me.

Michael Creel
Administrator
Sorry, I can't help on that. I've never done much benchmarking or tuning, I just start running my applications right away. Perhaps someone else will chime in.
Loading...