[yt-dev] Zombie jobs on eudora?

Nathan Goldbaum nathan12343 at gmail.com
Tue Jun 10 10:43:45 PDT 2014


On Tue, Jun 10, 2014 at 6:09 AM, Matthew Turk <matthewturk at gmail.com> wrote:

> Hi Nathan,
>
> On Mon, Jun 9, 2014 at 11:02 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
> > Hey all,
> >
> > I'm looking at a memory leak that Philip (cc'd) is seeing when iterating
> > over a long list of FLASH datasets.  Just as an example of the type of
> > behavior he is seeing - today he left his script running and ended up
> > consuming 300 GB of RAM on a viz node.
> >
> > FWIW, the dataset is not particularly large - ~300 outputs and ~100 MB
> per
> > output. These are also FLASH cylindrical coordinate simulations - so
> perhaps
> > this behavior will only occur in curvilinear geometries?
>
> Hm, I don't know about that.
>
> >
> > I've been playing with objgraph to try to understand what's happening.
> > Here's the script I've been using:
> http://paste.yt-project.org/show/4762/
> >
> > Here's the output after one iteration of the for loop:
> > http://paste.yt-project.org/show/4761/
> >
> > It seems that for some reason a lot of data is not being garbage
> collected.
> >
> > Could there be a reference counting bug somewhere down in a cython
> routine?
>
> Based on what you're running, the only Cython routines being called
> are likely in the selection system.
>
> > Objgraph is unable to find backreferences to root grid tiles in the flash
> > dataset, and all the other yt objects that I've looked at seem to have
> > backreference graphs that terminate at a FLASHGrid object that
> represents a
> > root grid tile in one of the datasets.  That's the best guess I have -
> but
> > definitely nothing conclusive.  I'd appreciate any other ideas anyone
> else
> > has to help debug this.
>
> I'm not entirely sure how to parse the output you've pasted, but I do
> have a thought.  If you have a reproducible case, I can test it
> myself.  I am wondering if this could be related to the way that grid
> masks are cached.  You should be able to test this by adding this line
> to _get_selector_mask in grid_patch.py, just before "return mask"
>
> self._last_mask = self._last_selector_id = None
>
> Something like this patch:
>
> http://paste.yt-project.org/show/4316/


Thanks for the code!  I will look into this today.

Sorry for not explaining the random terminal output I pasted from objgraph
:/

It's a list of objects created after yt operates on one dataset and after
the garbage collector is explicitly called. Each iteration of the loop sees
the creation of objects representing the FLASH grids, hierarchy, and
associated metadata.  With enough iterations this overhead from previous
loop iterations begins to dominate the total memory budget.


>
>
> -Matt
>
> >
> > Thanks for your help in debugging this!
> >
> > -Nathan
> >
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20140610/af079e07/attachment.html>


More information about the yt-dev mailing list