[yt-dev] Zombie jobs on eudora?

Matthew Turk matthewturk at gmail.com
Tue Jun 10 20:13:40 PDT 2014


Hi Nathan,

All it requires is a call to .index; you don't need to do anything
else to get it to lose references.

I'm still looking into it.

-Matt

On Tue, Jun 10, 2014 at 9:26 PM, Nathan Goldbaum <nathan12343 at gmail.com> wrote:
>
>
>
> On Tue, Jun 10, 2014 at 10:59 AM, Matthew Turk <matthewturk at gmail.com>
> wrote:
>>
>> Do you have a reproducible script?
>
>
> This should do the trick: http://paste.yt-project.org/show/4767/
>
> (this is with an enzo dataset by the way)
>
> That script prints (on my machine):
>
> EnzoGrid    15065
> YTArray     1520
> list        704
> dict        2
> MaskedArray 1
>
> Which indicates that 15000 EnzoGrid objects and 1520 YTArray objects have
> leaked.
>
> The list I'm printing out at the end of the script should be the objects
> that leaked during the loop over the Enzo dataset.  The
> objgraph.get_leaking_objects() function returns the list of all objects
> being tracked by the garbage collector that have no references but still
> have nonzero refcounts.
>
> This means the "original_leaks" list isn't necessarily a list of leaky
> objects - most of the things in there are singletons that the interpreter
> keeps around. To create a list of leaky objects produced by iterating over
> the loop I take the set difference of the output of get_leaking_objects()
> before and after iterating over the dataset.
>
>>
>> If you make a bunch of symlinks to
>> one flash file and load them all in sequence, does that replicate the
>> behavior?
>
>
> Yes, it seems to.  Compare the output of this script:
> http://paste.yt-project.org/show/4768/
>
> Adjust the range of the for loop from 0 to 5 - creating the needed symlinks
> to WindTunnel/windtunnel_4lev_hdf5_plt_cnt_0040 as needed.
>
>>
>>
>> On Tue, Jun 10, 2014 at 12:57 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>> wrote:
>> >
>> >
>> >
>> > On Tue, Jun 10, 2014 at 10:45 AM, Matthew Turk <matthewturk at gmail.com>
>> > wrote:
>> >>
>> >> Hi Nathan,
>> >>
>> >> On Tue, Jun 10, 2014 at 12:43 PM, Nathan Goldbaum
>> >> <nathan12343 at gmail.com>
>> >> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Jun 10, 2014 at 6:09 AM, Matthew Turk <matthewturk at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Nathan,
>> >> >>
>> >> >> On Mon, Jun 9, 2014 at 11:02 PM, Nathan Goldbaum
>> >> >> <nathan12343 at gmail.com>
>> >> >> wrote:
>> >> >> > Hey all,
>> >> >> >
>> >> >> > I'm looking at a memory leak that Philip (cc'd) is seeing when
>> >> >> > iterating
>> >> >> > over a long list of FLASH datasets.  Just as an example of the
>> >> >> > type
>> >> >> > of
>> >> >> > behavior he is seeing - today he left his script running and ended
>> >> >> > up
>> >> >> > consuming 300 GB of RAM on a viz node.
>> >> >> >
>> >> >> > FWIW, the dataset is not particularly large - ~300 outputs and
>> >> >> > ~100
>> >> >> > MB
>> >> >> > per
>> >> >> > output. These are also FLASH cylindrical coordinate simulations -
>> >> >> > so
>> >> >> > perhaps
>> >> >> > this behavior will only occur in curvilinear geometries?
>> >> >>
>> >> >> Hm, I don't know about that.
>> >> >>
>> >> >> >
>> >> >> > I've been playing with objgraph to try to understand what's
>> >> >> > happening.
>> >> >> > Here's the script I've been using:
>> >> >> > http://paste.yt-project.org/show/4762/
>> >> >> >
>> >> >> > Here's the output after one iteration of the for loop:
>> >> >> > http://paste.yt-project.org/show/4761/
>> >> >> >
>> >> >> > It seems that for some reason a lot of data is not being garbage
>> >> >> > collected.
>> >> >> >
>> >> >> > Could there be a reference counting bug somewhere down in a cython
>> >> >> > routine?
>> >> >>
>> >> >> Based on what you're running, the only Cython routines being called
>> >> >> are likely in the selection system.
>> >> >>
>> >> >> > Objgraph is unable to find backreferences to root grid tiles in
>> >> >> > the
>> >> >> > flash
>> >> >> > dataset, and all the other yt objects that I've looked at seem to
>> >> >> > have
>> >> >> > backreference graphs that terminate at a FLASHGrid object that
>> >> >> > represents a
>> >> >> > root grid tile in one of the datasets.  That's the best guess I
>> >> >> > have
>> >> >> > -
>> >> >> > but
>> >> >> > definitely nothing conclusive.  I'd appreciate any other ideas
>> >> >> > anyone
>> >> >> > else
>> >> >> > has to help debug this.
>> >> >>
>> >> >> I'm not entirely sure how to parse the output you've pasted, but I
>> >> >> do
>> >> >> have a thought.  If you have a reproducible case, I can test it
>> >> >> myself.  I am wondering if this could be related to the way that
>> >> >> grid
>> >> >> masks are cached.  You should be able to test this by adding this
>> >> >> line
>> >> >> to _get_selector_mask in grid_patch.py, just before "return mask"
>> >> >>
>> >> >> self._last_mask = self._last_selector_id = None
>> >> >>
>> >> >> Something like this patch:
>> >> >>
>> >> >> http://paste.yt-project.org/show/4316/
>> >> >
>> >> >
>> >> > Thanks for the code!  I will look into this today.
>> >> >
>> >> > Sorry for not explaining the random terminal output I pasted from
>> >> > objgraph
>> >> > :/
>> >> >
>> >> > It's a list of objects created after yt operates on one dataset and
>> >> > after
>> >> > the garbage collector is explicitly called. Each iteration of the
>> >> > loop
>> >> > sees
>> >> > the creation of objects representing the FLASH grids, hierarchy, and
>> >> > associated metadata.  With enough iterations this overhead from
>> >> > previous
>> >> > loop iterations begins to dominate the total memory budget.
>> >>
>> >> The code snippet I sent might help reduce it, but I think it speaks to
>> >> a deeper problem in that somehow the FLASH stuff isn't being GC'd
>> >> anywhere.  It really ought to be.
>> >>
>> >> Can you try also doing:
>> >>
>> >> yt.frontends.flash.FLASHDataset._skip_cache = True
>> >
>> >
>> > No effect, unfortunately.
>> >
>> >>
>> >> and seeing if that helps?
>> >>
>> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> -Matt
>> >> >>
>> >> >> >
>> >> >> > Thanks for your help in debugging this!
>> >> >> >
>> >> >> > -Nathan
>> >> >> >
>> >> >> _______________________________________________
>> >> >> yt-dev mailing list
>> >> >> yt-dev at lists.spacepope.org
>> >> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > yt-dev mailing list
>> >> > yt-dev at lists.spacepope.org
>> >> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >> >
>> >> _______________________________________________
>> >> yt-dev mailing list
>> >> yt-dev at lists.spacepope.org
>> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> >
>> >
>> > _______________________________________________
>> > yt-dev mailing list
>> > yt-dev at lists.spacepope.org
>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>



More information about the yt-dev mailing list