HPL low performance result.

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

HPL low performance result.

martin cech
Dear Everyone,
I used Pelican HPC to make my own cluster. I had 5 nodes including frontnode. Its all the same PCs Intel Pentium core2duo 1.8 GHz with 1024MB RAM connected throw 100Mbps 5port switch. I tried to run HPL on first node with default setting in HPL.dat, and result was 2,476 Gfpops (7,2 sec). Then consecutively conected one more node = 2,34 Gflops (7.71 sec). With third node 1,893Gflops (9,3 sec), fourth 1,88Gflops (9,56 sec) and finally whole 5 nodes cluster was 1,893 (9,52 sec). The performance is decreesing with more added nodes  :(  Do you have any solutions for this?



Best Regards Martin Cech.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Michael Creel
Administrator
The benchmark requires tuning to get good numbers. I really don't recall what the default tuning is on the released versions, and I don't make any effort to ensure that the results will be good. Please see the forum post http://www.nabble.com/How-to-get-big-numbers-with-the-HPL-benchmark-td19685268.html for more information.

If you or anyone else comes up with a good tuning, I'd be happy to make it the default.

Cheers, M.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

martin cech
Thank you for answer, now its better. I will try tuning HPL.dat and then send you some results. I get 12Gflops on 8 machines.

One other question is: is possible to use octave mpitb for benchmarking and run it on cluster (multiple machines)?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Michael Creel
Administrator
Sure you can use mpitb for benchmarking. I have a few academic papers that do just this, see below. The MPITB site has additional references. On Pelican, after setting up the cluster,  if you open a terminal, enter octave, and type "parallel_performance" you'll get some results that could be used to make a simple benchmark.

I strongly encourage you to see the MPITB page for a broader perspective - my own work is biased towards certain types of models and is certainly not representative of the general nature of applications of MPITB for Octave. I only cite the papers as examples that can give clues about how things can be done.

AUTORES M. Creel
TÍTULO: Using Parallelization to Solve a Macroeconomic Model: A Parallel Parameterized 
Expectations Algorithm
Computational Economics, 2008, 32(4), pp. 343­352.
CLAVE: A


AUTORES M. Creel
TÍTULO: I ran four million probits last night: HPC clustering with ParallelKnoppix
REF.: (2007) Journal of Applied Econometrics,  22, (1), 215­223
CLAVE: A


AUTORES M. Creel 
TÍTULO:  User­Friendly Parallel Computations with Econometric Examples
REF.: (2005) Computational Economics, 26, (2), 107­128  Computational Economics, 2005, vol. 26, 
issue 2, pages 107­128
CLAVE: A
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

martin cech
Dear Michael,
I tried to run parallel_performance in octave on single computer and then on 3nodes cluster, but the result was the same. (about 44s on 1 node and 22s on 2 nodes.) Only one core was fully used and then both cores were computing, but no more cores in cluster. Are there some imoportant parameters for execution and get better results?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Michael Creel
Administrator
While in octave, type "edit parallel_performance". At the bottom of the file you'll see

# loop over several cluster sizes
printf("Sample size: %d burnin: %d  maxiters %d\n", T, burnin, maxiters);
for nodes = 0:1
        pea_args{6} = nodes;
        pea(model, model_params, exp_model, exp_params, pea_args);
endfor



Just edit the line "for nodes = 0:1" to increase the maximum number of nodes.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

martin cech
Dear Michael,
there are some results of my measurements measurements results. Can you look at this and tell me why is there quite diferent in total performance on 32b and 64b, especially on low number of used PC (about 60%).

Thanks Martin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

martin cech
results ones angain HERE
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Michael Creel
Administrator
Hi Martin,
Interesting results, I'm especially glad to see  the results for PEA using MPITB - that's a real world problem, and seeing a good speedup there using MPITB and GNU Octave is something that I believe will interest people.

Why 64 bits is faster than 32 bits? I don't worry about that, I just use 64 bit Linux for all my work. I'm sure that an explanation is somewhere out there on the Internet. With modern CPUs, almost everyone should be using the 64 bit version. I don't understand why the 32 bit version gets downloaded more than the 64 bit version - either people don't realize that they can use the 64 bit version, or there are a lot of people clustering old computers (which is a waste of money, except for possible educational benefits).

Are you going to publish this work somewhere? I'm sure that the developer of MPITB would like to know about your results.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

martin cech
Hi Michael, I am writing diploma thesis which is focused to computer clustering. I will present this results there. If is it usefull for MPITB developers, of course you can send these result to them.

Do you know somebody how does similar tests with HPL or MPITB? I would like to compare my results with somebody. I tried contact mukarram (by email) but threre was no answer :(

Martin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Michael Creel
Administrator
The MPITB page is at http://atc.ugr.es/javier-bin/mpitb. You'll also find references to other work that uses MPITB there (http://atc.ugr.es/~javier/investigacion/papers/mpitb_octave_papers.html).  One of the papers listed there is by myself, and benchmarks the PEA similarly to what you have done.

Please note "Please, use the ICCS'06 conference paper below to cite MPITB for Octave. Thanks!" at the top of the MPITB papers page. You should definitely cite that paper - keeping projects like MPITB (and PelicanHPC) going requires continued funding, and citations help a lot.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

martin cech
This post was updated on .
yes I will note ICCS'06 conference and others. Is it possible download some of these documents for students purpose for free?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

martin cech
In reply to this post by Michael Creel
Thank you for material you sent me Michael. Can I ask you what version of BLAS, ATLAS, MPI are you using for pelican HPC ? I found (you probably know it) optimized BLAS library called GOTO BLAS. With this library you should get better performance in HPL benchmark. http://www.tacc.utexas.edu/general/staff/goto/

I got some email from HPL developers, after I sent them my results. They say that decrease of performance is probably because, 100 Mbps ethernet network has not big enough wideband (throughput), for theese processors Intel C2D.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Michael Creel
Administrator
Hi Martin,
To get the versions of packages, "type dpkg -l atlas*" while running Pelican, and it will tell you the version. The package name for MPI is "openmpi". I'm not sure if BLAS is used, since ATLAS is installed. When making an image, I use whatever version is in Debian at the time.

I have heard of GOTO BLAS, but I believe that this has a non-free license.

I agree that network latency and bandwidth are probably the reason for the HPL results. HPL is used to test the performance of top level supercomputers, so it has to be sensitive to these things, and it's not surprising that lowly 100Mb/s ethernet drags things down. I really don't worry at all about HPL, because it tests things that one would not expect Pelican to do well on, while running on commodity hardware and run of the mill networking. I put a lot more weight on benchmarks like the one you did using parallel_performance.m. Those are a lot more representative of a real-world situation, and they show that good speedups are possible. HPL on Pelican provides a well-known example that can be used simply to show that the cluster is working, albeit not too well by Top500 standards.

Cheers, M.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

martin cech
Hi Michael,
I tried HPL and OCTAVE parallel_performance on 1Gbps Ethernet LAN. There are changes for HPL. Decrease of performance is not so critical. Efficiency is going from 72% (1 node= 2 CPUs) to 54% (25 nodes = 50 CPUs). But there is no different for MPITB, times are very similar. I can explain it that program parallel_performance do not need to communicate throw network a lot and then 100Mbps is good enough. Is it correct?

Regards Martin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Michael Creel
Administrator
Hi Martin,
For HPL, good, that seems to be the expected behavior. For MPITB, I'm surprised that there is no improvement, That benchmark does include internode communication. I guess that with T=200000, internode communication is unimportant with respect to the pure number crunching, so there is little difference. For smaller values of T, I would guess that you would see a difference depending on the network bandwidth. Latency is also important. Possibly the latency of the 1GB/s network is high enough that the increased bandwidth gives little benefit.

Thanks for the information, I will definitely want to read your paper when it's done.
Michael
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Davids
In reply to this post by Michael Creel
Hello  Michael

am trying to do a benchmark of my Pelican Cluster using the linpack HPL, I was able to follow the instructions and run the xhpl example, i got results of about 2.048 GFLOPS, now i want to go the next step of tuning my cluster, i have looked at the tuning manual, but am having a problem on how to access the HPL.dat file so that i can begin to do some edit, tuning etc.
how shuld i tackle this problem.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Michael Creel
Administrator
This post was updated on .
What's the problem with editing the file?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Davids
the problem is i cannot access the file to do editing, just like how we can access the other c or octave codes by using edit (name of file), for example if i need to access the kernel_example code, all i do is type < edit kernel_example> at the octave command prompt.

so i have tried to access the HPL.dat file by using:
cd  /home/user/hpl-2.0
then when i change to the HPL folder, as
below
  ~/hpl-2.0$
i type ~/hpl-2.0$ edit HPL.dat

but i end up geting the following error:
no "edit" mailcap rules found for type "application/octet - stream  
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: HPL low performance result.

Michael Creel
Administrator
Inside Octave, "edit" calls up whatever text editor Octave is configured to work with. When you're at the command prompt, you need to call your favorite editor yourself. This has to do with basic Linux skills, which you should invest a little time in learning before trying to do parallel computing with MPI. You'll end up learning MPI much more quickly if you work on the more simple stuff first.
12
Loading...