[yt-dev] Fwd: [mpi4py] Fwd: [Numpy-discussion] Improving Python+MPI import performance

Mon Jan 16 10:31:19 PST 2012

Matt is right about the perils of putting any mpi imports in a try block.
Systems like Ranger will fail in a way that is not catchable by python when
trying to import mpi4py and not running in parallel.  I think the yt.pmods
solution is the best for now.  Since the imports problem really only gets
serious for more than about 100 cores, I think it's ok to impose some
additional requirements of understanding on the user if they're going to
run jobs that large.
Would it be possible to add some sort of helper function such that you
could do something like the following in a script:
from yt.pmods import *
parallel_import("from my_analysis import *")

That be helpful.
Britton

On Mon, Jan 16, 2012 at 1:20 PM, Matthew Turk <matthewturk at gmail.com> wrote:

> Hi all,
>
> Okay, seems like there is some confusion.  My response was in
> reference to Britton's question, which I thought was "are there any
> sideeffects of [your fix for recursive imports] on the operation [of
> the import hack for MPI]?"  There are not.
>
> My original statement, which Stephen disagrees with, is that we should
> require an explicit change on the part of the user before we (on their
> behalf) fundamentally modify the way the base functionality of
> 'import' works for all Python modules.  I have several motivations for
> this:
>
>  * The import problem is generic for shared filesystems being accessed
> in parallel, but is only crippling at relatively large core counts on
> particularly large lustre systems, compared to what most users
> utilize.  This is a -1 for global application of the MPI_Import fix.
>  * The change of yt.mods to yt.pmods is not an invasive change,
> although I too do not like having different behavior for running in
> parallel.  However, we do expect a number of things from users that
> run in parallel: an understanding of the resources they are to
> allocate, a set up of the queue script, and a recognition of which
> activities will parallelize and which will not.  I still think it is
> not the best solution, but I believe it is non-invasive.
>  * Every time we add on an additional non-sanitized import, we take a
> big performance hit.  It is in our best interest for this to all occur
> at the outermost level.
>  * Detecting whether we are running in parallel is not trivial.
>  * On some machines, specifically SGI, if you run a script in parallel
> that contains a try/except block for importing MPI and it was not
> launched with MPI, it will die unceremoniously.  We cannot rely on a
> try/except of importing MPI.
>
> All these things combined lead me to believe that I think we should
> not attempt to guess *for* the user
>
> My proposed change, of adding yt.pmods, would consist of a new file
> (yt/pmods.py) that contains the full contents of MPI_Import.py and
> that, at the end, performs this operation:
>
> with mpi_import():
>    from yt.mods import *
>
> What this would result in is a nearly self-contained script that
> returned to the user the contents of yt.mods; there'd be no
> duplication.  An alternate solution, which I am not terribly keen on,
> would be to put manual context startup/shutdown inside yt.mods if
> startup_tasks.parallel_enabled is true.  I feel like this would result
> in a lot of unnecessary side effects.  However, with yt.pmods, while
> the yt imports would all be included, any additional imports would
> still need sanitation inside the users' script.
>
> The fallback would be to require users to use the with: statement
> themselves.
>
> -Matt
>
> On Mon, Jan 16, 2012 at 11:08 AM, j s oishi <jsoishi at gmail.com> wrote:
> >> I am of the opinion we can do this with an alternate, parallel import
> that
> >> would be compatible with yt.mods. Something like yt.pmods. What do you
> >> think? What should usage of this be?
> >
> > Not sure I understand this. How is yt.pmods compatible with yt.mods?
> > Do you mean both would have the same effect, but yt.pmods would use
> > the new mechanism for loading rather than the current standard one
> > which would be retained by yt.mods? If so, that sounds like an OK idea
> > to me, though if there is no side effect, it could lead to people
> > forgetting to sub in pmods, and then being stuck with bad, old
> > performance. Given that this will be most important where even a
> > single forgetful job could cost substantial allocation usage, maybe we
> > should think about making it automatic?
> >
> > Or perhaps I am misunderstanding...
> > _______________________________________________
> > yt-dev mailing list
> > yt-dev at lists.spacepope.org
> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20120116/9e9bac92/attachment.html>