[yt-users] Problem with Rockstar + time series in yt 3.2-dev

Kacper Kowalik xarthisius.kk at gmail.com
Wed Jun 3 09:53:51 PDT 2015


On 06/03/2015 08:06 AM, Matthew Turk wrote:
> Hi Brian,
> 
> If I had to guess, I'd say that it's not related to MPI, since you're
> excluding the infiniband.  I think it's more likely there's an issue
> with yt freeing or not freeing one or more of the Rockstar global data
> structures, and MPI is the one that catches it or throws the segfault
> somehow.  Can you try to get a core dump, check the most recent stack
> frame in all threads that lives inside Python space, and see if you
> can get a coarse estimate of where it's happening?
> 
> -Matt

Hi Brian,
I've debugged this a bit. SIGSEGV happens in:

yt/analysis_modules/halo_finding/rockstar/rockstar.py:
RockstarHaloFinder.__del__

I think you can safely remove that method, as worker pool cleaning also
happens in .run(). When I'm 100% sure it's the right way to do I'll
issue PR.

One word of caution: I don't really see any significant speed
improvement with 8 procs versus minimum case of 3 procs. However, that
may be caused by the fact that your datasets are fairly small.

Cheers,
Kacper

> On Tue, Jun 2, 2015 at 9:31 PM, Brian O'Shea <bwoshea at gmail.com> wrote:
>> Hi folks,
>>
>> I'm having some problems creating a time series of halo catalogs with
>> Rockstar on a small cosmology run, using the tip of yt 3.2-dev (changeset
>> a2b03516ed2c) with mpi4py v1.3.1 (and OpenMPI v1.4.3) installed on a local
>> Linux cluster.
>>
>> I'm pretty confident that it has something specifically to do with a time
>> series. When I use this script to call rockstar on a single dataset:
>>
>>     http://paste.yt-project.org/show/5586/
>>
>> with this command line:
>>
>>     mpirun -np 8 --mca btl ^openib python new_rockstar_ts.py --parallel
>>
>> everything works just fine, and does so for every RDNNNN dataset.  However,
>> when I uncomment lines 40-44 and comment out lines 47-53 in the same script
>> (i.e., like this: http://paste.yt-project.org/show/5587/) so that the code
>> now uses a time series of all of the RDNNNN datasets rather than a single
>> dataset, and use the same command line, I immediately get a seg fault that
>> appears to be related to the mpi4py package:
>>
>> http://paste.yt-project.org/show/5588/
>>
>> The datasets that I'm using for the time series can be found here:
>>
>> http://galactica.pa.msu.edu/~bwoshea/data/datasets/rockstar_timeseries.tar.gz
>>
>> (total size ~300 MB).
>>
>> Does anybody have any idea what's going on?
>>
>> Thanks!
>>
>> --Brian
>>
>>
>>
>>
>>
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20150603/7aec1eab/attachment.sig>


More information about the yt-users mailing list