|
Hi,
I am quite a newbie when it comes to computer stuff (compared to you) and I have a question if you don't mind. Both me and my fiance are econ grad students and need to estimate structural equations (dynamic discrete choice) using fortran. Last week, I found a sale of used computers at school at 40 dollars each and long story short, with our existing computers, we have now five desktops with a total 20 cores almost all running around 2.5-3 ghz. Since what we are doing is quite computationally intensive, I thought it'd be nice to tease out all the processing power we have. After a little bit of work, I found out about Pelican and went through the tutorial with two of my computers quite easily. And I got excited, because I might be able build home cluster with some cables and a router. Now my question(s): is whether there is a way to use all these cores I have. I am dreaming passing jobs via MPI (my fiance knows how to use it properly) and then use OpenMP to make the nodes use all the cores properly. I have little experience in both and no experience in PelicanHPC, so I am asking if this is feasible and if so how hard to accomplish would it be. Also, what kind of performance do you think I can get from a cluster of say an amd 6-core desktop(16gb ram) and 3 core 2 quad intels (4gb each, Qxxxx series), so I can assess the benefit of the trouble I will get into. thanks a lot.. |
|
Administrator
|
I think that a little cluster should work just fine for what you plan to do, except the noise may drive you out of your apartment. I personally have not used OpenMP - I use MPI for everything, because I like the portability of the code. However, there's no reason why you can't mix MPI and OpenMP. Note that MPI works just fine with multi-core nodes - you just assign an appropriate number of MPI ranks to each node to balance the load, e.g., 2 ranks for a 2 core machine, and 6 ranks for a 6 core machine. You can get fancier if the machines operate at different speeds. A PelicanHPC cluster is more or less like any Beowulf-style Linux cluster, it just gets launched from a live CD/USB image, but once it's functioning it's like what you might have at your university.
Regarding estimation, I have a paper in Computational Economics (2005) (http://econpapers.repec.org/article/kapcompec/v_3a26_3ay_3a2005_3ai_3a2_3ap_3a107-128.htm) that uses MLE and GMM as examples of problems that can be parallelized with MPI. That may be of interest. About performance, I would expect that you'd get at least a 10-12X speedup if you run on 20 cores, compared to running on 1 core. This depends a lot on the problem your working with and finding the best way to parallelize it. What sort of estimation method are you thinking of using? |
|
Hi again,
Thanks for your response. I think I have understood the basic workings of the PelicanHPC, I just need to learn a bit more about networking hardware to setup the cluster with more than two computers. Even with two computers I have twice the speed (2.8 vs 5.5 seconds) with your example program so I am very hopeful. I am basically pursuing a simulated maximum likelihood estimation, where there is also a inner loop to solve a dynamic program for a given set of parameters via backwards recursion. My understanding about MPI/MP is as following, would you please correct me if I am wrong: If somebody uses MPI in a cluster with multicore nodes, there is no way to tell MPI to see cores as nodes and use all the cores for calculations (maybe except for creating lots of virtual machines on all the nodes and emulating a bigger cluster) and the only way to utilize all cores is to pass instructions to nodes (via MPI) which have multicore related codes in them, like OpenMP. (And this is doable but will require some thoughtful engineering.) thanks |
|
Administrator
|
No, you can use all cores on each node easily with MPI. See the kernel_example.m example for Octave on PelicanHPC. Suppose you have one machine, the frontend, with 4 cores, and a second machine with 6 cores. To run on all cores, edit /home/user/tmp/bhosts to read
10.11.12.1 slots=4 10.11.12.X slots=6 where X is the last bit of whatever IP got assikgned to the compute node. Then at the command prompt type mpirun -np 11 -hostfile /home/user/tmp octave -q --eval "kernel_example(2000, true, false)" There is a lot to learn about here, try "man mpirun" to get a quick idea of the possibilities. I really don't know about the possible performance benefits of using OpenMP. I have found pure MPI to work well for what I do. Cheers, M. |
|
Oh, alright. I don't know why but I thought MPI would not run the code parallelized on my cores and only utilize one core from each node. This new info changes everything! I don't have to worry about about OpenMP anymore than.. Thanks a lot for your help, I am just doing a feasibility research for my estimation and I think it is very doable thanks to you (PelicanHPC). In the future, when I do the actual work, I might have more technical problems, I hope you don't mind if I bug you a few times about PelicanHPC.
cheers, utku |
| Powered by Nabble | See how NAML generates this page |
