[yt-dev] Fwd: [mpi4py] Fwd: [Numpy-discussion] Improving Python+MPI import performance

Matthew Turk matthewturk at gmail.com
Fri Jan 13 07:44:28 PST 2012


For a long time we have been running into this very problem.  I think
it would be appropriate to utilize this code on Kraken, Ranger, etc.
My implementation suggestion would be to put this in the new
startup_tasks, where we determine parallelism.  As noted in the
docstring it will have to be modified to use mpi4py.

Britton or Stephen, this sounds like it's directly up your alley as
you run on Kraken the most often.  Would one of you be willing to test
it out?  My feeling is that we could simply suggest that on these
systems we use this idiom at the top of scripts (where we assume we
distribute this script with yt):

from yt.mpi_importer import mpi_import
with mpi_import():
    from yt.mods import *

I think it should recursively watch all the imports.  An alternate
option would be to insert some of its logic into yt.mods, or even have
a second mods file that handles it seamlessly, like:

from yt.pmods import *

Ideas?

-Matt


---------- Forwarded message ----------
From: Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no>
Date: Fri, Jan 13, 2012 at 3:51 AM
Subject: [mpi4py] Fwd: [Numpy-discussion] Improving Python+MPI import
performance
To: mpi4py at googlegroups.com
Cc: Chris Kees <cekees at gmail.com>


This looks very interesting,

Dag

-------- Original Message --------
Subject: [Numpy-discussion] Improving Python+MPI import performance
Date: Thu, 12 Jan 2012 17:13:41 -0800
From: Asher Langton <langton2 at llnl.gov>
Reply-To: Discussion of Numerical Python <numpy-discussion at scipy.org>
To: numpy-discussion at scipy.org

Hi all,

(I originally posted this to the BayPIGgies list, where Fernando Perez
suggested I send it to the NumPy list as well. My apologies if you're
receiving this email twice.)

I work on a Python/C++ scientific code that runs as a number of
independent Python processes communicating via MPI. Unfortunately, as
some of you may have experienced, module importing does not scale well
in Python/MPI applications. For 32k processes on BlueGene/P, importing
100 trivial C-extension modules takes 5.5 hours, compared to 35
minutes for all other interpreter loading and initialization. We
developed a simple pure-Python module (based on knee.py, a
hierarchical import example) that cuts the import time from 5.5 hours
to 6 minutes.

The code is available here:

https://github.com/langton/MPI_Import

Usage, implementation details, and limitations are described in a
docstring at the beginning of the file (just after the mandatory
legalese).

I've talked with a few people who've faced the same problem and heard
about a variety of approaches, which range from putting all necessary
files in one directory to hacking the interpreter itself so it
distributes the module-loading over MPI. Last summer, I had a student
intern try a few of these approaches. It turned out that the problem
wasn't so much the simultaneous module loads, but rather the huge
number of failed open() calls (ENOENT) as the interpreter tries to
find the module files. In the MPI_Import module, we have rank 0
perform the module lookups and then broadcast the locations to the
rest of the processes. For our real-world scientific applications
written in Python and C++, this has meant that we can start a problem
and actually make computational progress before the batch allocation
ends.

If you try out the code, I'd appreciate any feedback you have:
performance results, bugfixes/feature-additions, or alternate
approaches to solving this problem. Thanks!

-Asher
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

--
You received this message because you are subscribed to the Google
Groups "mpi4py" group.
To post to this group, send email to mpi4py at googlegroups.com.
To unsubscribe from this group, send email to
mpi4py+unsubscribe at googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/mpi4py?hl=en.



More information about the yt-dev mailing list