[yt-dev] parallel_objects not freeing communicators?

Matthew Turk matthewturk at gmail.com
Fri Apr 18 16:34:23 PDT 2014


Hi Josh,

I suspect you're right, but I can't tell where the problem is arising.
 What we do at the end of parallel_objects is this:

if parallel_capable:
    communication_system.pop()

I believe this was implemented under the understanding that the
communicators, when garbage collected, would be destroyed.  Evidently
this is not the case, and so a __del__ method should be implemented
for the Communicator object.  I think this will need to be some call
to self.comm.Free().  Can you try that and see if it helps?

-Matt

On Fri, Apr 18, 2014 at 6:36 PM, Josh Moloney
<Joshua.Moloney at colorado.edu> wrote:
> I have a script that I'm using to perform various analysis functions on
> halos from an Enzo simulation. In this script I'm using parallel_objects to
> split the halos up among multiple processors.
>
> for sto, halo in parallel_objects(halos, num_procs, storage=halo_props):
>          sto.result_id = halo.id
>          sto.result = analyze_halo(pf, halo, radius, old_time, current_time,
> f_ej)
>
> within analyze_halo I use WeightedAverageQuantity on a region containing
> each halo, which from what I understand also uses parallel_objects.
>
> reg = pf.h.sphere(halo.center_of_mass(),radius)
> properties = {'metallicity': (reg.quantities['WeightedAverageQuantity']
>     ('metallicity','Density')).in_units('Zsun')}
>
> This works fine for a while, but after the script has analyzed several
> thousand halos it eventually crashes with an MPI exception 'Too many
> communicators'. I took a quick look at the code in
> parallel_analysis_interface, and it seems like parallel_objects creates new
> communicators with every call but never explicitly frees them. Does anyone
> know if this is actually the case, or is it an issue with my script or the
> mpi libraries on my machine? I'm using the experimental branch of yt-3.0.
> I should be able to get this script working fairly easily by just
> calculating my weighted average directly rather than calling quantities (and
> I probably don't want this being done in parallel anyways since I'm only
> looking at small regions), but if parallel_objects is actually leaking
> communicators then it should be fixed at some point.
>        - Josh
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>



More information about the yt-dev mailing list