[Yt-dev] Fwd: [mpi4py] Python on 10K of cores on BG/P

Wed Feb 10 11:40:11 PST 2010

Just as a note, moving forward.

---------- Forwarded message ----------
From: Brian Granger <ellisonbg at gmail.com>
Date: Wed, Feb 10, 2010 at 11:34 AM
Subject: Re: [mpi4py] Python on 10K of cores on BG/P
To: mpi4py <mpi4py at googlegroups.com>

> We have been developing an electronic structure simulation software
> GPAW (https://wiki.fysik.dtu.dk/gpaw/).
> The software is written mostly in Python with the core computational
> routines in C-extensions. For parallel
> calculations we use MPI which is called both from C and Python
> (through our own Python interfaces for the
> MPI calls we need).

Nice!

> We have run the code successfully on different supercomputing
> architectures such as Cray XT5 and Blue Gene,
> however as we are moving to thousands or tens of thousands processes
> one limitation of the current
> approach has become evident: at start-up time, the imports of python
> modules are starting to take
> increasing amount of time as huge number of processors try to read the
> same .py/.pyc files and the
> filesystem cannot naturally handle this efficiently.

Yes, I can imagine that if the .py files are on a shared filesystem,
things would grind to a halt.
The best way to fix this is to simply install all the .py files on the
local disks of the compute
nodes....assuming the compute nodes have local disks :-).

If they don't have local disks, you are in a really tough situation.
In some cases, it is feasible to think about
saving the state of the python interpreter (along with imported
modules), but in this case, I am doubtful that
will work.  If you are importing Python modules that link to
C/C++/Fortran code, this will be very difficult.
Furthermore, if your Python code is calling to MPI, you will also have
to handle to fact that you have a live MPI universe
with open sockets and so on.  Separating out the parts that you
can/want to send from the parts you can't/don't
want to send will be quite a mess.

AND, even if you are able to serialize the entire state of the Python
inperpreter, you will still have to scatter
it to all compute nodes (and unserialize it), which is what the shared
filesystem is doing to begin with.
While this scatter all may take place over a faster interconnect, you
don't be able to get rid of it.

Thus, in my mind, using a local disk is the only reasonable way to go.
 I realize it is likely that the local disk
solution is not an option for you.  In that case, I think you should
go back to Cray and ask for an upgrade ;-)

Cheers,

Brian

> Is it possible to modify the Python interpreter in order to have a
> single process do the import and then
> broadcast the data to the rest of the tasks?

> --
> Nichols A. Romero, Ph.D.
> Argonne Leadership Computing Facility
> Argonne, IL 60490
> (630) 252-3441 (O)
> (630) 470-0462 (C)
>
> --
> You received this message because you are subscribed to the Google Groups "mpi4py" group.
> To post to this group, send email to mpi4py at googlegroups.com.
> To unsubscribe from this group, send email to mpi4py+unsubscribe at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en.
>
>

--
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com

--
You received this message because you are subscribed to the Google
Groups "mpi4py" group.
To post to this group, send email to mpi4py at googlegroups.com.
To unsubscribe from this group, send email to
mpi4py+unsubscribe at googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/mpi4py?hl=en.