[yt-dev] Zombie jobs on eudora?

Matthew Turk matthewturk at gmail.com
Wed Jun 11 21:21:23 PDT 2014


On Wed, Jun 11, 2014 at 6:35 PM, Nathan Goldbaum <nathan12343 at gmail.com> wrote:
> I've hackily tried a few things - unfortunately there doesn't seem to be a
> way to do this without deleting FLASH's __del__ method.  That's unfortunate,
> since I don't see a way to cleanly replace what it does without refactoring
> the FLASH frontend to always close handles immediately after it uses them.
>
> FWIW, __del__ is defined several other places in yt:
>
> $ grin 'def __del__'
> ./yt/analysis_modules/halo_finding/rockstar/rockstar.py:
>   277 :     def __del__(self):
> ./yt/analysis_modules/halo_merger_tree/enzofof_merger_tree.py:
>   110 :     def __del__(self):
> ./yt/frontends/chombo/data_structures.py:
>   274 :     def __del__(self):
> ./yt/frontends/fits/data_structures.py:
>   651 :     def __del__(self):
> ./yt/frontends/flash/data_structures.py:
>   387 :     def __del__(self):
> ./yt/frontends/pluto/data_structures.py:
>   183 :     def __del__(self):
> ./yt/geometry/geometry_handler.py:
>    66 :     def __del__(self):
> ./yt/geometry/grid_geometry_handler.py:
>    62 :     def __del__(self):
> ./yt/utilities/parallel_tools/parallel_analysis_interface.py:
>   621 :     def __del__(self):
>  1214 :     def __del__(self):
>
> The other frontends that handle hdf5 i/o like FLASH does will also likely
> need to be fixed to avoid this behavior.
>
> Another option would be to figure out what is triggering the reference
> cycles and add weakref.proxy's as needed.  I've just tried to do that but
> can't seem to find the where the culprit references are.

This is what we used to do.  I'm almost certain they are now there
because of the index and index proxy.

I'm wondering if maybe we should have a filehandle object.  That's
where most of the issues with __del__ happen, as you note above.  So
if we had that, we'd end up in a situation where *it* (which would
*only* be referenced by the Dataset object) would have a finalizer but
the Dataset wouldn't.

>
> I'm going to stop working on this for now. I hope my scratching at this
> today has been helpful.
>
>
>
> On Wed, Jun 11, 2014 at 4:53 PM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>> This is tricky because of the way handles are held onto by h5py, but maybe
>> we can figure it out somehow.
>>
>> On Jun 11, 2014 5:41 PM, "Nathan Goldbaum" <nathan12343 at gmail.com> wrote:
>>>
>>> Oh I see.  Objects that implement __del__ cannot be garbage collected if
>>> they participate in a cycle.  Full stop.  This is fixed in python 3.4.
>>>
>>>
>>> http://objgraph.readthedocs.org/en/latest/uncollectable.html?highlight=del#uncollectable-garbage
>>>
>>>
>>> On Wed, Jun 11, 2014 at 4:37 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>>> wrote:
>>>>
>>>> Deleting FLASHDataset.__del__ does seem to allow the dataset to be
>>>> garbage collected.  For some reason using super to call __del__ doesn't
>>>> work.
>>>>
>>>> That said, it seems that memory is still leaking, even though the
>>>> datasets are being garbage collected.  In particular, it looks like the
>>>> FLASHGrid objects are *not* being garbage collected.
>>>>
>>>> After spending a bit of time looking at backref graphs
>>>> (http://i.imgur.com/zvAAYwA.png, http://i.imgur.com/4JIT5Qz.png), I ended up
>>>> trying the following patch:
>>>>
>>>> http://paste.yt-project.org/show/4780/
>>>>
>>>> With this patch I see substantially better memory performance.  Where
>>>> before peak memory usage was 1.3 GB after 20 outputs (and growing) now peak
>>>> memory usage is only 300 MB, with visible deallocations in htop after each
>>>> iteration of the loop.
>>>>
>>>> Counterintuitively, *not* calling __del__ seems to substantially improve
>>>> memory performance.  In both cases I tried calling __del__ on the superclass
>>>> using super() but unfortunately I still saw memory leaks when I did that.  I
>>>> have no idea why.
>>>>
>>>>
>>>> On Wed, Jun 11, 2014 at 3:56 PM, Matthew Turk <matthewturk at gmail.com>
>>>> wrote:
>>>>>
>>>>> That's fascinating. The superclass issues a bunch of deletes. Can you
>>>>> try either removing the del method (may leave dangling hdf5 files) or
>>>>> calling super() from it?
>>>>>
>>>>> On Jun 11, 2014 5:54 PM, "Nathan Goldbaum" <nathan12343 at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> So one difference relative to Enzo is that FLASHDataset implements
>>>>>> __del__ while EnzoDataset does not.  This seems to be the reason that
>>>>>> FLASHDataset objects are ending up in gc.garbage rather than being freed.
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 11, 2014 at 3:48 PM, Matthew Turk <matthewturk at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> This shouldn't be preferentially affecting flash, though. I think it
>>>>>>> is a leftover from when we moved to unify index; when the hierarchy had the
>>>>>>> classes, pf was already a weakref.
>>>>>>>
>>>>>>> On Jun 11, 2014 5:46 PM, "Nathan Goldbaum" <nathan12343 at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> After noticing that the FLASHDataset objects seem to have
>>>>>>>> backreferences to the data object aliases that are attached to the dataset,
>>>>>>>> I made the following modification:
>>>>>>>>
>>>>>>>> diff -r d1de2160a4a8 yt/data_objects/static_output.py
>>>>>>>> --- a/yt/data_objects/static_output.py  Wed Jun 11 13:25:17 2014
>>>>>>>> -0700
>>>>>>>> +++ b/yt/data_objects/static_output.py  Wed Jun 11 15:42:56 2014
>>>>>>>> -0700
>>>>>>>> @@ -498,7 +498,7 @@
>>>>>>>>                  continue
>>>>>>>>              cname = cls.__name__
>>>>>>>>              if cname.endswith("Base"): cname = cname[:-4]
>>>>>>>> -            self._add_object_class(name, cname, cls, {'pf':self})
>>>>>>>> +            self._add_object_class(name, cname, cls,
>>>>>>>> {'pf':weakref.proxy(self)})
>>>>>>>>          if self.refine_by != 2 and hasattr(self, 'proj') and \
>>>>>>>>              hasattr(self, 'overlap_proj'):
>>>>>>>>              mylog.warning("Refine by something other than two:
>>>>>>>> reverting to"
>>>>>>>>
>>>>>>>> After doing so, I find that the FLASHDataset objects are ending up
>>>>>>>> in gc.garbage. The memory is still leaking, but now the garbage collector is
>>>>>>>> able to at least find the objects that are supposed to be collected.
>>>>>>>>
>>>>>>>> From my reading of the docs things will only end up in gc.garbage if
>>>>>>>> they have a __del__ method that doesn't actually free the object.  Any idea
>>>>>>>> what might be happening here?
>>>>>>>>
>>>>>>>> -Nathan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 11, 2014 at 3:37 PM, Matthew Turk
>>>>>>>> <matthewturk at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Jun 11, 2014 at 5:28 PM, Nathan Goldbaum
>>>>>>>>> <nathan12343 at gmail.com> wrote:
>>>>>>>>> > On Wed, Jun 11, 2014 at 3:04 PM, Matthew Turk
>>>>>>>>> > <matthewturk at gmail.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> Please do, yeah. That should help us track down the memory
>>>>>>>>> >> increase.
>>>>>>>>> >> It's possible FLASH needs additional work; have you checked the
>>>>>>>>> >> refcounts for its grid objects?
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>> > With this script: http://paste.yt-project.org/show/4779/
>>>>>>>>> >
>>>>>>>>> > I also see steadily increasing memory usage, although it's not
>>>>>>>>> > clear if
>>>>>>>>> > that's just because each successive Enzo dataset is larger than
>>>>>>>>> > the one
>>>>>>>>> > before. The peak memory usage is 230 MB, so substantially better
>>>>>>>>> > than the
>>>>>>>>> > FLASH dataset.
>>>>>>>>>
>>>>>>>>> My guess is there is a leak in flash. You can evaluate whether it
>>>>>>>>> is the data size by looking at what the memory use is if you just load the
>>>>>>>>> final one or the final ten.
>>>>>>>>>
>>>>>>>>> When I have a minute I will try to replicate here. I suspect it's
>>>>>>>>> not important that it's cylindrical.
>>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > I can privately share the FLASH dataset Philip was originally
>>>>>>>>> > having trouble
>>>>>>>>> > with if that will help.
>>>>>>>>> >
>>>>>>>>> > The root grid seems to have a refcount of 7. I'm not sure how
>>>>>>>>> > many of those
>>>>>>>>> > references were generated by objgraph itself.
>>>>>>>>> >
>>>>>>>>> > _______________________________________________
>>>>>>>>> > yt-dev mailing list
>>>>>>>>> > yt-dev at lists.spacepope.org
>>>>>>>>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> yt-dev mailing list
>>>>>>>>> yt-dev at lists.spacepope.org
>>>>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> yt-dev mailing list
>>>>>>>> yt-dev at lists.spacepope.org
>>>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> yt-dev mailing list
>>>>>>> yt-dev at lists.spacepope.org
>>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> yt-dev mailing list
>>>>>> yt-dev at lists.spacepope.org
>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> yt-dev mailing list
>>>>> yt-dev at lists.spacepope.org
>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> yt-dev mailing list
>>> yt-dev at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>
>>
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>



More information about the yt-dev mailing list