|
Hi Pretty good news is that Pelicanhpc already has the HPL Linpack file in the user folder. 1. All we need to is locate using the linux basics using this command: ( for those who may be newbies to linux) : cd /home/user/hpl-2.0 2. At this point, we are already in the HPL directory, so we perform the compilation using the following command: sh SetupForPelican. Make sure you maintain the case sensitive of the command. 3. After Compilation we change to this directory using the following command: cd ./bin/Pelican 4. Then we can now run our XHPL to test the performance of our cluster, using the following command: mpirun --hostfile /home/user/tmp/bhosts -np X xhpl Note: X value depends on the number of nodes and the proposed processor grid size of your cluster. 5. When I used the HPL.dat file without tuning it, and run it on my cluster of 5 nodes , as below: mpirun --hostfile /home/user/tmp/bhosts -np 5 xhpl I got the following results : 2.048 Gfloos. And an average timing of 8.80 Seconds. 6. Any one would want to improve his performance , so we will have to tune our cluster, this will require to have some prior knowledge of the basics of HPL Input file understanding. For tuning purposes , we should focus our tuning on the 3 major areas: 1. N , the problem size. 2. NB- Block size. 3. Process Grid Ratio = (P X Q). 7. Therefore I go ahead to access the HPL.dat file by going back to HPL folder as below using the following command: cd /home/user/hpl-2.0 then access the HPL.dat file using any editor of your choice, for my case I used the vi editor: vi HPL.dat 8. At this point you will have the HPL.dat input file, then go through it, Considering the 3 parameters we can start tuning the Processor Grid (PXQ) , depending on your hardware Specifications. Mine am using a cluster of 1 Frontend node and 4 Compute nodes , all Duo core each with 2GB Physical Memory. (RAM), 2.33GHZ , Intel 9. Since I have 10 cores = 5 *2 Cores available, I could want to use only the compute nodes for the computations, therefore I will use the 8 CORES. Possible Configurations: P * Q = 2 * 4 # Processor Grid size = 1. P * Q = 1 * 3 & 1 * 5 # Processor Grid size = 2. P * Q = 1 * 4 & 1* 4 # Processor Grid size = 2. P * Q = 2 * 3 & 1* 2 # Processor Grid size = 2. NOTE; Whatever way u tune the grid, it should be not more than the total number of cores available on your cluster this value becomes the new X for example i my case X = 8 Cores. There so many ways you can tackle your tuning, for example for me I chose to narrow down my experimenting to 5 scenarios: 1. Fixed Processor Grid, Fixed Block size and Varying Problem size N. 2. Fixed Processor Grid, Fixed Problem size and Varying Block size. 3. Fixed Problem size, Fixed Block size and Varying the Processor grid. 4. Fixed Problem size, Varying the Block size and Varying the Processor grid. 5. Fixed Block size, Varying the Problem size and Varying the Processor grid. On starting with scenario 1 ; 1. When I have N = 3000, NB = 128, P * Q = 2 * 4, I have 2.048 Gflops 2. When I have N = 5000, NB = 128, P*Q = 2* 4 , I have 3.112 Gflops. 3. When I have N = 10000, NB= 128, P*Q = 2*4, I have 5.362 Gflops. Part 2 compared to part 1 indicates that there is an 51.953% increase, part 3 compared to part 2 indicates a 72.301% increase in performance. And so on, so fourth you can continue to carry out your tests according to your choice. Hope this would be helpful. |
|
Administrator
|
Thanks for the summary. The issue of HPL tuning has come up a few times. For the released images I don't really put any effort into tuning, because appropriate tuning depends upon the hardware that will be used. For people interested in geting good numers from HPL, there is quite a bit of information on the net for this. There's even a page that will generate the .dat file for you: http://www.advancedclustering.com/faq/how-do-i-tune-my-hpldat-file.html I'm not sure how well that works, thoug.
|
|
Hi
i have been exhaustively testing my cluster overall system performance using the hpl linpack, now i want to further narrow down my benchmarking to the subsystem level, with concentration on 1. General network performance 2. Communication latency 3. Communication bandwidth. i guess it could involve me having to do some fine tuning of the openmpi and other neccessary activities. thou am kinda confused on how i should get started and what i should really concentrate on to achieve my targets, i request for some brief guidelines and hints. |
|
Administrator
|
Sorry, I can't help on that. I've never done much benchmarking or tuning, I just start running my applications right away. Perhaps someone else will chime in.
|
| Powered by Nabble | See how NAML generates this page |
