[yt-dev] Reducing memory usage in time series

Matthew Turk matthewturk at gmail.com
Tue Feb 4 10:44:52 PST 2014


Hi John,

On Tue, Feb 4, 2014 at 1:21 PM, John Wise <jwise at physics.gatech.edu> wrote:
> Hi all,
>
> I've been trying to run rockstar in a sizable Enzo simulation (150k grids)
> with ~100 outputs, where it's running out of memory just loading the
> hierarchies.  One hierarchy instance consumes almost 1GB!  I've found this
> to be problem not specific to rockstar but time series objects.

Hm.  How are you iterating over the parameter files?  With the time
series we try to do a load/retain on demand system, where the
parameter files and their hierarchies are only kept around as long as
they need to be.  Devin and Hilary looked at how this worked with
Rockstar, and I thought they came up with that it was okay.

>
> My solution is to explicitly delete the hierarchy's metadata and grids.
> Since I haven't contributed to yt-3.0 yet, I wanted to run this by everyone
> before submitting a PR.
>
> My question is about coding style, in that I see very few __del__()
> functions now.  In my working version, I've defined a __del__ function for
> the grid_geometry_handler as
>
>     def __del__(self):
>         del self.grid_dimensions
>         del self.grid_left_edge
>         del self.grid_right_edge
>         del self.grid_levels
>         del self.grid_particle_count
>         del self.grids
>
> When I delete pf._instantiated_hierarchy after each loop of a time series
> iterator, I don't see any excessive memory usage anymore.  It just reuses
> the allocated memory from the previous iteration, which is totally fine by
> me.  However, when I include this in a __del__ function for a static_output,
> I still see excessive memory usage, which is bizarre to me.

Hmm.

I'm of two minds on this.  On the one hand, I am not really *opposed*
to destructors, but I don't like that they are necessary.  Because the
hierarchy is weirdly self-referential to the static output, this
sometimes causes problems and the garbage collector doesn't pick it
up.  However, when the parameter file is deallocated, it *should*
deallocate all of the arrays.  Whether it does or not may be related
to the system allocator, and whether it reuses the memory is
potentially also related to that.  On the other hand, I'd rather fix
the issue of having a separate index and static output object, and
break the reference cycle between them.

So I guess where I fall down on this is: I'd like to fix the
underlying issue, which is something I have been off-and-on working
on.  But since you are measurably seeing improvement with this change,
I'm okay with it going in.  But hopefully it will become obsolete
eventually.  ;-)

Incidentally, I would still like to see how the time series is
iterating, and how the references pass through the system.

A related paper you might find interesting:
http://www.dlr.de/sc/en/Portaldata/15/Resources/dokumente/PyHPC2013/submissions/pyhpc2013_submission_6.pdf

-Matt

>
> Should I define a new routine in the grid_geometry_handler, something like
> clear_hierarchy(), or keep the __del__ function?  I ask because I want to
> keep in line with the overall structure of yt-3.0.  This could also be
> included in the clear_all_data() call.
>
> What do people think the best approach would be?
>
> Thanks,
> John
>
> --
> John Wise
> Assistant Professor of Physics
> Center for Relativistic Astrophysics, Georgia Tech
> http://cosmo.gatech.edu
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org



More information about the yt-dev mailing list