[yt-users] Problem with Rockstar + time series in yt 3.2-dev

Brian O'Shea bwoshea at gmail.com
Wed Jun 3 14:11:27 PDT 2015


Hi all,

Apologies for the delay in reply - I've been at a conference all day and
haven't had the chance to try any of this.  I'll give it a shot and report
back soon.  Thanks for your very rapid responses!

--Brian


On Wed, Jun 3, 2015 at 1:15 PM, Kacper Kowalik <xarthisius.kk at gmail.com>
wrote:

> On 06/03/2015 11:53 AM, Kacper Kowalik wrote:
> > On 06/03/2015 08:06 AM, Matthew Turk wrote:
> >> Hi Brian,
> >>
> >> If I had to guess, I'd say that it's not related to MPI, since you're
> >> excluding the infiniband.  I think it's more likely there's an issue
> >> with yt freeing or not freeing one or more of the Rockstar global data
> >> structures, and MPI is the one that catches it or throws the segfault
> >> somehow.  Can you try to get a core dump, check the most recent stack
> >> frame in all threads that lives inside Python space, and see if you
> >> can get a coarse estimate of where it's happening?
> >>
> >> -Matt
> >
> > Hi Brian,
> > I've debugged this a bit. SIGSEGV happens in:
> >
> > yt/analysis_modules/halo_finding/rockstar/rockstar.py:
> > RockstarHaloFinder.__del__
> >
> > I think you can safely remove that method, as worker pool cleaning also
> > happens in .run(). When I'm 100% sure it's the right way to do I'll
> > issue PR.
>
> Ha! It's not even necessary. In your original you only create instance
> of RockstarHaloFinder then exit(). If you actually run it, by: rh.run()
> it will work ok and exit cleanly.
>
> __del__() was always a magical method for me. I'm not sure how to "fix"
> it properly.
> Cheers,
> Kacper
>
> > One word of caution: I don't really see any significant speed
> > improvement with 8 procs versus minimum case of 3 procs. However, that
> > may be caused by the fact that your datasets are fairly small.
> >
> > Cheers,
> > Kacper
> >
> >> On Tue, Jun 2, 2015 at 9:31 PM, Brian O'Shea <bwoshea at gmail.com> wrote:
> >>> Hi folks,
> >>>
> >>> I'm having some problems creating a time series of halo catalogs with
> >>> Rockstar on a small cosmology run, using the tip of yt 3.2-dev
> (changeset
> >>> a2b03516ed2c) with mpi4py v1.3.1 (and OpenMPI v1.4.3) installed on a
> local
> >>> Linux cluster.
> >>>
> >>> I'm pretty confident that it has something specifically to do with a
> time
> >>> series. When I use this script to call rockstar on a single dataset:
> >>>
> >>>     http://paste.yt-project.org/show/5586/
> >>>
> >>> with this command line:
> >>>
> >>>     mpirun -np 8 --mca btl ^openib python new_rockstar_ts.py --parallel
> >>>
> >>> everything works just fine, and does so for every RDNNNN dataset.
> However,
> >>> when I uncomment lines 40-44 and comment out lines 47-53 in the same
> script
> >>> (i.e., like this: http://paste.yt-project.org/show/5587/) so that the
> code
> >>> now uses a time series of all of the RDNNNN datasets rather than a
> single
> >>> dataset, and use the same command line, I immediately get a seg fault
> that
> >>> appears to be related to the mpi4py package:
> >>>
> >>> http://paste.yt-project.org/show/5588/
> >>>
> >>> The datasets that I'm using for the time series can be found here:
> >>>
> >>>
> http://galactica.pa.msu.edu/~bwoshea/data/datasets/rockstar_timeseries.tar.gz
> >>>
> >>> (total size ~300 MB).
> >>>
> >>> Does anybody have any idea what's going on?
> >>>
> >>> Thanks!
> >>>
> >>> --Brian
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> yt-users mailing list
> >>> yt-users at lists.spacepope.org
> >>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >>>
> >> _______________________________________________
> >> yt-users mailing list
> >> yt-users at lists.spacepope.org
> >> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >>
> >
> >
>
>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20150603/cdc266c4/attachment.htm>


More information about the yt-users mailing list