[yt-dev] Issue #1031: Bug Report: Memory leak when rendering projection plots in parallel (yt_analysis/yt)

Wed Jun 10 13:34:36 PDT 2015

New issue 1031: Bug Report: Memory leak when rendering projection plots in parallel
https://bitbucket.org/yt_analysis/yt/issue/1031/bug-report-memory-leak-when-rendering

ggoodwin52:

I emailed this bug report out to yt-users before finding this repository, sorry for the double post.

I believe I have found a memory leak when iterating through a DatasetSeries in parallel to render projection plots of each data file in the series. I work in a CFD research group and our code is built on boxlib, so I have been using yt to generate images of my simulations. The DatasetSeries's I work with are fairly large, on the order of 6000 to 7000 data sets per series. To expedite the image rendering process, I have been using yt's parallelism to iterate through the series with openmpi.

The ram usage, per mpi process, increases steadily throughout the duration of the script's runtime. When running my attached script with 16 mpi processes on a 16 core machine (1 process per core) the ram usage starts at about 2.56 GB per process and gradually increases to over 25 GB per process, at which time the script typically crashes due to insufficient ram on the machine to continue running.

My simulations do involve data sets of increasing memory requirements versus simulation time, but not to the scale mentioned above in the corresponding ram increases. I determined that the crashing is due to a memory leak by starting the script over after it crashed, using only the data set files that had not yet been processed (to ensure that the memory required to render the projection of each data set is the same as it was before it crashed by starting over at the same time in the simulation as when it crashed).

I also ran this script on my university's cluster, using several different openmpi implementations, and always got the same result. Additionally, the crashes always occur at approximately the same script runtime.

Attached to this email is a copy of a script I'm using. I run the script on my 16 core mini-cluster using the command:

mpirun -np 16 --cpus-per-proc 1 br0.05_temp.py

I also attached a sample image of the projection plots I'm generating.

Any help with this bug is appreciated!