[yt-dev] Zombie jobs on eudora?

Nathan Goldbaum nathan12343 at gmail.com
Wed Jun 11 22:07:33 PDT 2014


This works. I've implemented it for FLASH and hopefully will have the rest
of the frontends done before bed tonight.

Thanks everyone for your input with this!


On Wed, Jun 11, 2014 at 9:21 PM, Matthew Turk <matthewturk at gmail.com> wrote:

> On Wed, Jun 11, 2014 at 6:35 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
> > I've hackily tried a few things - unfortunately there doesn't seem to be
> a
> > way to do this without deleting FLASH's __del__ method.  That's
> unfortunate,
> > since I don't see a way to cleanly replace what it does without
> refactoring
> > the FLASH frontend to always close handles immediately after it uses
> them.
> >
> > FWIW, __del__ is defined several other places in yt:
> >
> > $ grin 'def __del__'
> > ./yt/analysis_modules/halo_finding/rockstar/rockstar.py:
> >   277 :     def __del__(self):
> > ./yt/analysis_modules/halo_merger_tree/enzofof_merger_tree.py:
> >   110 :     def __del__(self):
> > ./yt/frontends/chombo/data_structures.py:
> >   274 :     def __del__(self):
> > ./yt/frontends/fits/data_structures.py:
> >   651 :     def __del__(self):
> > ./yt/frontends/flash/data_structures.py:
> >   387 :     def __del__(self):
> > ./yt/frontends/pluto/data_structures.py:
> >   183 :     def __del__(self):
> > ./yt/geometry/geometry_handler.py:
> >    66 :     def __del__(self):
> > ./yt/geometry/grid_geometry_handler.py:
> >    62 :     def __del__(self):
> > ./yt/utilities/parallel_tools/parallel_analysis_interface.py:
> >   621 :     def __del__(self):
> >  1214 :     def __del__(self):
> >
> > The other frontends that handle hdf5 i/o like FLASH does will also likely
> > need to be fixed to avoid this behavior.
> >
> > Another option would be to figure out what is triggering the reference
> > cycles and add weakref.proxy's as needed.  I've just tried to do that but
> > can't seem to find the where the culprit references are.
>
> This is what we used to do.  I'm almost certain they are now there
> because of the index and index proxy.
>
> I'm wondering if maybe we should have a filehandle object.  That's
> where most of the issues with __del__ happen, as you note above.  So
> if we had that, we'd end up in a situation where *it* (which would
> *only* be referenced by the Dataset object) would have a finalizer but
> the Dataset wouldn't.
>
> >
> > I'm going to stop working on this for now. I hope my scratching at this
> > today has been helpful.
> >
> >
> >
> > On Wed, Jun 11, 2014 at 4:53 PM, Matthew Turk <matthewturk at gmail.com>
> wrote:
> >>
> >> This is tricky because of the way handles are held onto by h5py, but
> maybe
> >> we can figure it out somehow.
> >>
> >> On Jun 11, 2014 5:41 PM, "Nathan Goldbaum" <nathan12343 at gmail.com>
> wrote:
> >>>
> >>> Oh I see.  Objects that implement __del__ cannot be garbage collected
> if
> >>> they participate in a cycle.  Full stop.  This is fixed in python 3.4.
> >>>
> >>>
> >>>
> http://objgraph.readthedocs.org/en/latest/uncollectable.html?highlight=del#uncollectable-garbage
> >>>
> >>>
> >>> On Wed, Jun 11, 2014 at 4:37 PM, Nathan Goldbaum <
> nathan12343 at gmail.com>
> >>> wrote:
> >>>>
> >>>> Deleting FLASHDataset.__del__ does seem to allow the dataset to be
> >>>> garbage collected.  For some reason using super to call __del__
> doesn't
> >>>> work.
> >>>>
> >>>> That said, it seems that memory is still leaking, even though the
> >>>> datasets are being garbage collected.  In particular, it looks like
> the
> >>>> FLASHGrid objects are *not* being garbage collected.
> >>>>
> >>>> After spending a bit of time looking at backref graphs
> >>>> (http://i.imgur.com/zvAAYwA.png, http://i.imgur.com/4JIT5Qz.png), I
> ended up
> >>>> trying the following patch:
> >>>>
> >>>> http://paste.yt-project.org/show/4780/
> >>>>
> >>>> With this patch I see substantially better memory performance.  Where
> >>>> before peak memory usage was 1.3 GB after 20 outputs (and growing)
> now peak
> >>>> memory usage is only 300 MB, with visible deallocations in htop after
> each
> >>>> iteration of the loop.
> >>>>
> >>>> Counterintuitively, *not* calling __del__ seems to substantially
> improve
> >>>> memory performance.  In both cases I tried calling __del__ on the
> superclass
> >>>> using super() but unfortunately I still saw memory leaks when I did
> that.  I
> >>>> have no idea why.
> >>>>
> >>>>
> >>>> On Wed, Jun 11, 2014 at 3:56 PM, Matthew Turk <matthewturk at gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> That's fascinating. The superclass issues a bunch of deletes. Can you
> >>>>> try either removing the del method (may leave dangling hdf5 files) or
> >>>>> calling super() from it?
> >>>>>
> >>>>> On Jun 11, 2014 5:54 PM, "Nathan Goldbaum" <nathan12343 at gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> So one difference relative to Enzo is that FLASHDataset implements
> >>>>>> __del__ while EnzoDataset does not.  This seems to be the reason
> that
> >>>>>> FLASHDataset objects are ending up in gc.garbage rather than being
> freed.
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Jun 11, 2014 at 3:48 PM, Matthew Turk <
> matthewturk at gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> This shouldn't be preferentially affecting flash, though. I think
> it
> >>>>>>> is a leftover from when we moved to unify index; when the
> hierarchy had the
> >>>>>>> classes, pf was already a weakref.
> >>>>>>>
> >>>>>>> On Jun 11, 2014 5:46 PM, "Nathan Goldbaum" <nathan12343 at gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> After noticing that the FLASHDataset objects seem to have
> >>>>>>>> backreferences to the data object aliases that are attached to
> the dataset,
> >>>>>>>> I made the following modification:
> >>>>>>>>
> >>>>>>>> diff -r d1de2160a4a8 yt/data_objects/static_output.py
> >>>>>>>> --- a/yt/data_objects/static_output.py  Wed Jun 11 13:25:17 2014
> >>>>>>>> -0700
> >>>>>>>> +++ b/yt/data_objects/static_output.py  Wed Jun 11 15:42:56 2014
> >>>>>>>> -0700
> >>>>>>>> @@ -498,7 +498,7 @@
> >>>>>>>>                  continue
> >>>>>>>>              cname = cls.__name__
> >>>>>>>>              if cname.endswith("Base"): cname = cname[:-4]
> >>>>>>>> -            self._add_object_class(name, cname, cls, {'pf':self})
> >>>>>>>> +            self._add_object_class(name, cname, cls,
> >>>>>>>> {'pf':weakref.proxy(self)})
> >>>>>>>>          if self.refine_by != 2 and hasattr(self, 'proj') and \
> >>>>>>>>              hasattr(self, 'overlap_proj'):
> >>>>>>>>              mylog.warning("Refine by something other than two:
> >>>>>>>> reverting to"
> >>>>>>>>
> >>>>>>>> After doing so, I find that the FLASHDataset objects are ending up
> >>>>>>>> in gc.garbage. The memory is still leaking, but now the garbage
> collector is
> >>>>>>>> able to at least find the objects that are supposed to be
> collected.
> >>>>>>>>
> >>>>>>>> From my reading of the docs things will only end up in gc.garbage
> if
> >>>>>>>> they have a __del__ method that doesn't actually free the object.
>  Any idea
> >>>>>>>> what might be happening here?
> >>>>>>>>
> >>>>>>>> -Nathan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Jun 11, 2014 at 3:37 PM, Matthew Turk
> >>>>>>>> <matthewturk at gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 11, 2014 at 5:28 PM, Nathan Goldbaum
> >>>>>>>>> <nathan12343 at gmail.com> wrote:
> >>>>>>>>> > On Wed, Jun 11, 2014 at 3:04 PM, Matthew Turk
> >>>>>>>>> > <matthewturk at gmail.com> wrote:
> >>>>>>>>> >>
> >>>>>>>>> >>
> >>>>>>>>> >> Please do, yeah. That should help us track down the memory
> >>>>>>>>> >> increase.
> >>>>>>>>> >> It's possible FLASH needs additional work; have you checked
> the
> >>>>>>>>> >> refcounts for its grid objects?
> >>>>>>>>> >>
> >>>>>>>>> >
> >>>>>>>>> > With this script: http://paste.yt-project.org/show/4779/
> >>>>>>>>> >
> >>>>>>>>> > I also see steadily increasing memory usage, although it's not
> >>>>>>>>> > clear if
> >>>>>>>>> > that's just because each successive Enzo dataset is larger than
> >>>>>>>>> > the one
> >>>>>>>>> > before. The peak memory usage is 230 MB, so substantially
> better
> >>>>>>>>> > than the
> >>>>>>>>> > FLASH dataset.
> >>>>>>>>>
> >>>>>>>>> My guess is there is a leak in flash. You can evaluate whether it
> >>>>>>>>> is the data size by looking at what the memory use is if you
> just load the
> >>>>>>>>> final one or the final ten.
> >>>>>>>>>
> >>>>>>>>> When I have a minute I will try to replicate here. I suspect it's
> >>>>>>>>> not important that it's cylindrical.
> >>>>>>>>>
> >>>>>>>>> >
> >>>>>>>>> > I can privately share the FLASH dataset Philip was originally
> >>>>>>>>> > having trouble
> >>>>>>>>> > with if that will help.
> >>>>>>>>> >
> >>>>>>>>> > The root grid seems to have a refcount of 7. I'm not sure how
> >>>>>>>>> > many of those
> >>>>>>>>> > references were generated by objgraph itself.
> >>>>>>>>> >
> >>>>>>>>> > _______________________________________________
> >>>>>>>>> > yt-dev mailing list
> >>>>>>>>> > yt-dev at lists.spacepope.org
> >>>>>>>>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >>>>>>>>> >
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> yt-dev mailing list
> >>>>>>>>> yt-dev at lists.spacepope.org
> >>>>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> yt-dev mailing list
> >>>>>>>> yt-dev at lists.spacepope.org
> >>>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >>>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> yt-dev mailing list
> >>>>>>> yt-dev at lists.spacepope.org
> >>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> yt-dev mailing list
> >>>>>> yt-dev at lists.spacepope.org
> >>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> yt-dev mailing list
> >>>>> yt-dev at lists.spacepope.org
> >>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >>>>>
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> yt-dev mailing list
> >>> yt-dev at lists.spacepope.org
> >>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >>>
> >>
> >> _______________________________________________
> >> yt-dev mailing list
> >> yt-dev at lists.spacepope.org
> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >>
> >
> >
> > _______________________________________________
> > yt-dev mailing list
> > yt-dev at lists.spacepope.org
> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20140611/9046fe73/attachment.html>


More information about the yt-dev mailing list