[yt-dev] Zombie jobs on eudora?

Nathan Goldbaum nathan12343 at gmail.com
Tue Jun 10 19:26:00 PDT 2014


On Tue, Jun 10, 2014 at 10:59 AM, Matthew Turk <matthewturk at gmail.com>
wrote:

> Do you have a reproducible script?


This should do the trick: http://paste.yt-project.org/show/4767/

(this is with an enzo dataset by the way)

That script prints (on my machine):

EnzoGrid    15065
YTArray     1520
list        704
dict        2
MaskedArray 1

Which indicates that 15000 EnzoGrid objects and 1520 YTArray objects have
leaked.

The list I'm printing out at the end of the script should be the objects
that leaked during the loop over the Enzo dataset.  The
objgraph.get_leaking_objects() function returns the list of all objects
being tracked by the garbage collector that have no references but still
have nonzero refcounts.

This means the "original_leaks" list isn't necessarily a list of leaky
objects - most of the things in there are singletons that the interpreter
keeps around. To create a list of leaky objects produced by iterating over
the loop I take the set difference of the output of get_leaking_objects()
before and after iterating over the dataset.


> If you make a bunch of symlinks to
> one flash file and load them all in sequence, does that replicate the
> behavior?
>

Yes, it seems to.  Compare the output of this script:
http://paste.yt-project.org/show/4768/

Adjust the range of the for loop from 0 to 5 - creating the needed symlinks
to WindTunnel/windtunnel_4lev_hdf5_plt_cnt_0040 as needed.


>
> On Tue, Jun 10, 2014 at 12:57 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
> >
> >
> >
> > On Tue, Jun 10, 2014 at 10:45 AM, Matthew Turk <matthewturk at gmail.com>
> > wrote:
> >>
> >> Hi Nathan,
> >>
> >> On Tue, Jun 10, 2014 at 12:43 PM, Nathan Goldbaum <
> nathan12343 at gmail.com>
> >> wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Jun 10, 2014 at 6:09 AM, Matthew Turk <matthewturk at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi Nathan,
> >> >>
> >> >> On Mon, Jun 9, 2014 at 11:02 PM, Nathan Goldbaum
> >> >> <nathan12343 at gmail.com>
> >> >> wrote:
> >> >> > Hey all,
> >> >> >
> >> >> > I'm looking at a memory leak that Philip (cc'd) is seeing when
> >> >> > iterating
> >> >> > over a long list of FLASH datasets.  Just as an example of the type
> >> >> > of
> >> >> > behavior he is seeing - today he left his script running and ended
> up
> >> >> > consuming 300 GB of RAM on a viz node.
> >> >> >
> >> >> > FWIW, the dataset is not particularly large - ~300 outputs and ~100
> >> >> > MB
> >> >> > per
> >> >> > output. These are also FLASH cylindrical coordinate simulations -
> so
> >> >> > perhaps
> >> >> > this behavior will only occur in curvilinear geometries?
> >> >>
> >> >> Hm, I don't know about that.
> >> >>
> >> >> >
> >> >> > I've been playing with objgraph to try to understand what's
> >> >> > happening.
> >> >> > Here's the script I've been using:
> >> >> > http://paste.yt-project.org/show/4762/
> >> >> >
> >> >> > Here's the output after one iteration of the for loop:
> >> >> > http://paste.yt-project.org/show/4761/
> >> >> >
> >> >> > It seems that for some reason a lot of data is not being garbage
> >> >> > collected.
> >> >> >
> >> >> > Could there be a reference counting bug somewhere down in a cython
> >> >> > routine?
> >> >>
> >> >> Based on what you're running, the only Cython routines being called
> >> >> are likely in the selection system.
> >> >>
> >> >> > Objgraph is unable to find backreferences to root grid tiles in the
> >> >> > flash
> >> >> > dataset, and all the other yt objects that I've looked at seem to
> >> >> > have
> >> >> > backreference graphs that terminate at a FLASHGrid object that
> >> >> > represents a
> >> >> > root grid tile in one of the datasets.  That's the best guess I
> have
> >> >> > -
> >> >> > but
> >> >> > definitely nothing conclusive.  I'd appreciate any other ideas
> anyone
> >> >> > else
> >> >> > has to help debug this.
> >> >>
> >> >> I'm not entirely sure how to parse the output you've pasted, but I do
> >> >> have a thought.  If you have a reproducible case, I can test it
> >> >> myself.  I am wondering if this could be related to the way that grid
> >> >> masks are cached.  You should be able to test this by adding this
> line
> >> >> to _get_selector_mask in grid_patch.py, just before "return mask"
> >> >>
> >> >> self._last_mask = self._last_selector_id = None
> >> >>
> >> >> Something like this patch:
> >> >>
> >> >> http://paste.yt-project.org/show/4316/
> >> >
> >> >
> >> > Thanks for the code!  I will look into this today.
> >> >
> >> > Sorry for not explaining the random terminal output I pasted from
> >> > objgraph
> >> > :/
> >> >
> >> > It's a list of objects created after yt operates on one dataset and
> >> > after
> >> > the garbage collector is explicitly called. Each iteration of the loop
> >> > sees
> >> > the creation of objects representing the FLASH grids, hierarchy, and
> >> > associated metadata.  With enough iterations this overhead from
> previous
> >> > loop iterations begins to dominate the total memory budget.
> >>
> >> The code snippet I sent might help reduce it, but I think it speaks to
> >> a deeper problem in that somehow the FLASH stuff isn't being GC'd
> >> anywhere.  It really ought to be.
> >>
> >> Can you try also doing:
> >>
> >> yt.frontends.flash.FLASHDataset._skip_cache = True
> >
> >
> > No effect, unfortunately.
> >
> >>
> >> and seeing if that helps?
> >>
> >> >
> >> >>
> >> >>
> >> >>
> >> >> -Matt
> >> >>
> >> >> >
> >> >> > Thanks for your help in debugging this!
> >> >> >
> >> >> > -Nathan
> >> >> >
> >> >> _______________________________________________
> >> >> yt-dev mailing list
> >> >> yt-dev at lists.spacepope.org
> >> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > yt-dev mailing list
> >> > yt-dev at lists.spacepope.org
> >> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >> >
> >> _______________________________________________
> >> yt-dev mailing list
> >> yt-dev at lists.spacepope.org
> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >
> >
> >
> > _______________________________________________
> > yt-dev mailing list
> > yt-dev at lists.spacepope.org
> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> >
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20140610/b05f48a1/attachment.htm>


More information about the yt-dev mailing list