[yt-users] Problem with Rockstar + time series in yt 3.2-dev

Kacper Kowalik xarthisius.kk at gmail.com
Wed Jun 3 10:15:17 PDT 2015


On 06/03/2015 11:53 AM, Kacper Kowalik wrote:
> On 06/03/2015 08:06 AM, Matthew Turk wrote:
>> Hi Brian,
>>
>> If I had to guess, I'd say that it's not related to MPI, since you're
>> excluding the infiniband.  I think it's more likely there's an issue
>> with yt freeing or not freeing one or more of the Rockstar global data
>> structures, and MPI is the one that catches it or throws the segfault
>> somehow.  Can you try to get a core dump, check the most recent stack
>> frame in all threads that lives inside Python space, and see if you
>> can get a coarse estimate of where it's happening?
>>
>> -Matt
> 
> Hi Brian,
> I've debugged this a bit. SIGSEGV happens in:
> 
> yt/analysis_modules/halo_finding/rockstar/rockstar.py:
> RockstarHaloFinder.__del__
> 
> I think you can safely remove that method, as worker pool cleaning also
> happens in .run(). When I'm 100% sure it's the right way to do I'll
> issue PR.

Ha! It's not even necessary. In your original you only create instance
of RockstarHaloFinder then exit(). If you actually run it, by: rh.run()
it will work ok and exit cleanly.

__del__() was always a magical method for me. I'm not sure how to "fix"
it properly.
Cheers,
Kacper

> One word of caution: I don't really see any significant speed
> improvement with 8 procs versus minimum case of 3 procs. However, that
> may be caused by the fact that your datasets are fairly small.
> 
> Cheers,
> Kacper
> 
>> On Tue, Jun 2, 2015 at 9:31 PM, Brian O'Shea <bwoshea at gmail.com> wrote:
>>> Hi folks,
>>>
>>> I'm having some problems creating a time series of halo catalogs with
>>> Rockstar on a small cosmology run, using the tip of yt 3.2-dev (changeset
>>> a2b03516ed2c) with mpi4py v1.3.1 (and OpenMPI v1.4.3) installed on a local
>>> Linux cluster.
>>>
>>> I'm pretty confident that it has something specifically to do with a time
>>> series. When I use this script to call rockstar on a single dataset:
>>>
>>>     http://paste.yt-project.org/show/5586/
>>>
>>> with this command line:
>>>
>>>     mpirun -np 8 --mca btl ^openib python new_rockstar_ts.py --parallel
>>>
>>> everything works just fine, and does so for every RDNNNN dataset.  However,
>>> when I uncomment lines 40-44 and comment out lines 47-53 in the same script
>>> (i.e., like this: http://paste.yt-project.org/show/5587/) so that the code
>>> now uses a time series of all of the RDNNNN datasets rather than a single
>>> dataset, and use the same command line, I immediately get a seg fault that
>>> appears to be related to the mpi4py package:
>>>
>>> http://paste.yt-project.org/show/5588/
>>>
>>> The datasets that I'm using for the time series can be found here:
>>>
>>> http://galactica.pa.msu.edu/~bwoshea/data/datasets/rockstar_timeseries.tar.gz
>>>
>>> (total size ~300 MB).
>>>
>>> Does anybody have any idea what's going on?
>>>
>>> Thanks!
>>>
>>> --Brian
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
> 
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20150603/42565981/attachment.sig>


More information about the yt-users mailing list