[yt-dev] Zombie jobs on eudora?

Matthew Turk matthewturk at gmail.com
Tue Jun 10 10:45:20 PDT 2014


Hi Nathan,

On Tue, Jun 10, 2014 at 12:43 PM, Nathan Goldbaum <nathan12343 at gmail.com> wrote:
>
>
>
> On Tue, Jun 10, 2014 at 6:09 AM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>> Hi Nathan,
>>
>> On Mon, Jun 9, 2014 at 11:02 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>> wrote:
>> > Hey all,
>> >
>> > I'm looking at a memory leak that Philip (cc'd) is seeing when iterating
>> > over a long list of FLASH datasets.  Just as an example of the type of
>> > behavior he is seeing - today he left his script running and ended up
>> > consuming 300 GB of RAM on a viz node.
>> >
>> > FWIW, the dataset is not particularly large - ~300 outputs and ~100 MB
>> > per
>> > output. These are also FLASH cylindrical coordinate simulations - so
>> > perhaps
>> > this behavior will only occur in curvilinear geometries?
>>
>> Hm, I don't know about that.
>>
>> >
>> > I've been playing with objgraph to try to understand what's happening.
>> > Here's the script I've been using:
>> > http://paste.yt-project.org/show/4762/
>> >
>> > Here's the output after one iteration of the for loop:
>> > http://paste.yt-project.org/show/4761/
>> >
>> > It seems that for some reason a lot of data is not being garbage
>> > collected.
>> >
>> > Could there be a reference counting bug somewhere down in a cython
>> > routine?
>>
>> Based on what you're running, the only Cython routines being called
>> are likely in the selection system.
>>
>> > Objgraph is unable to find backreferences to root grid tiles in the
>> > flash
>> > dataset, and all the other yt objects that I've looked at seem to have
>> > backreference graphs that terminate at a FLASHGrid object that
>> > represents a
>> > root grid tile in one of the datasets.  That's the best guess I have -
>> > but
>> > definitely nothing conclusive.  I'd appreciate any other ideas anyone
>> > else
>> > has to help debug this.
>>
>> I'm not entirely sure how to parse the output you've pasted, but I do
>> have a thought.  If you have a reproducible case, I can test it
>> myself.  I am wondering if this could be related to the way that grid
>> masks are cached.  You should be able to test this by adding this line
>> to _get_selector_mask in grid_patch.py, just before "return mask"
>>
>> self._last_mask = self._last_selector_id = None
>>
>> Something like this patch:
>>
>> http://paste.yt-project.org/show/4316/
>
>
> Thanks for the code!  I will look into this today.
>
> Sorry for not explaining the random terminal output I pasted from objgraph
> :/
>
> It's a list of objects created after yt operates on one dataset and after
> the garbage collector is explicitly called. Each iteration of the loop sees
> the creation of objects representing the FLASH grids, hierarchy, and
> associated metadata.  With enough iterations this overhead from previous
> loop iterations begins to dominate the total memory budget.

The code snippet I sent might help reduce it, but I think it speaks to
a deeper problem in that somehow the FLASH stuff isn't being GC'd
anywhere.  It really ought to be.

Can you try also doing:

yt.frontends.flash.FLASHDataset._skip_cache = True

and seeing if that helps?

>
>>
>>
>>
>> -Matt
>>
>> >
>> > Thanks for your help in debugging this!
>> >
>> > -Nathan
>> >
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>



More information about the yt-dev mailing list