[yt-users] Parallelism in yt Applied to Large Datasets

Wed Dec 6 13:16:46 PST 2017

On Wed, Dec 6, 2017 at 3:12 PM, Britton Smith <brittonsmith at gmail.com>
wrote:

> Running out of RAM when iterating over datasets is also pretty common.
> This might have something to do with how the iterator yields a loaded
> dataset.  Either way, one thing that I've found helps is to explicitly
> delete objects and clear out data.  Every yt data container has a
> clear_data function that you can call to free up field arrays.  You might
> add that as well as explicitly deleting the objects.  In addition, doing
> your own garbage collection with gc.collect() seems to help quite a bit as
> well.  Even then, I still often see RAM issues when doing this type of
> analysis and so I will typically implement a script that can be restarted
> from where it left off, usually by checking for the existence of output
> files.
>

If you can make an example where this is happening I'd love to see it. It's
possible that we have a memory leak somewhere. In principle you shouldn't
need to manually manage memory with del or gc.collect() in python.

>
> Britton
>
> On Wed, Dec 6, 2017 at 12:21 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
>
>>
>>
>> On Wed, Dec 6, 2017 at 2:17 PM, Jason Galyardt <jason.galyardt at gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I am currently using dataset parallelization, like so:
>>>
>>> ts = yt.load("/path/mydata_hdf5_plt*")
>>> results = {}
>>> for sto, ds in ts.piter(storage=results):
>>>    # processing on each dataset using:
>>>    # cut regions
>>>    # center of mass calculations
>>>    # weighted averages
>>>    # projections
>>>
>>
>> It's possible that any one of these operations is causing your RAM to
>> blow up. It might even be a bug in yt that's triggering a memory leak.
>>
>> Is there any chance you can narrow it down to a single operation that's
>> causing the memory to blow up?
>>
>>
>>>
>>> if yt.is_root():
>>>     # process aggregate results
>>>
>>> I'll try Britton's suggestion of specifying the number of processors to
>>> use as a subset of the total available.
>>>
>>
>> Yes that would probably help. Right now I suspect that you have more than
>> one core on the same compute node simultaneously processing many output
>> files. You probably only want one or a couple outputs per compute node.
>>
>>
>>>
>>> Also, it may or may not be relevant, but the cluster I'm using does not
>>> support MPICH2 (installed by conda as a dependency for mpi4py) due to lack
>>> of InfiniBand support. I have removed the mpi4py and mpich2 conda packages
>>> and reinstalled with the OpenMPI implementation:
>>>
>>> conda remove mpi4py
>>> conda remove mpich2
>>> conda install -c mpi4py mpi4py
>>>
>>> To check that openmpi is now installed, you can do the following (an
>>> asterisk will appear next to any installed packages):
>>>
>>> conda search -c mpi4py mpi4py
>>> conda search -c mpi4py openmpi
>>>
>>> On my cluster, these commands show that I have the following packages
>>> installed:
>>>
>>> mpi4py
>>> *  2.0.0            py27_openmpi_2  mpi4py
>>>
>>> openmpi
>>> *  1.10.2                        1  mpi4py
>>>
>>>
>>> Is it possible that this is causing my RAM consumption problems?
>>>
>>
>> Maybe but I kinda doubt it. If you're running yt on a cluster you
>> probably don't want to use mpi4py from conda and instead just build mpi4py
>> from source (with "pip install mpi4py") using the native MPI library for
>> that cluster. From my experience mpi4py is pretty good about being able to
>> detect the proper MPI library at compile time.
>>
>>
>>>
>>> Thanks for the help,
>>> Jason
>>>
>>>
>>>
>>> On Wed, Dec 6, 2017 at 2:45 PM, <yt-users-request at lists.spacepope.org>
>>> wrote:
>>>
>>>> Send yt-users mailing list submissions to
>>>>         yt-users at lists.spacepope.org
>>>>
>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>         http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>> or, via email, send a message with subject or body 'help' to
>>>>         yt-users-request at lists.spacepope.org
>>>>
>>>> You can reach the person managing the list at
>>>>         yt-users-owner at lists.spacepope.org
>>>>
>>>> When replying, please edit your Subject line so it is more specific
>>>> than "Re: Contents of yt-users digest..."
>>>>
>>>> Today's Topics:
>>>>
>>>>    1. Parallelism in yt Applied to Large Datasets (Jason Galyardt)
>>>>    2. Re: Parallelism in yt Applied to Large Datasets (Nathan Goldbaum)
>>>>    3. Re: Parallelism in yt Applied to Large Datasets (Scott Feister)
>>>>    4. Re: Parallelism in yt Applied to Large Datasets (Britton Smith)
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Jason Galyardt <jason.galyardt at gmail.com>
>>>> To: Discussion of the yt analysis package <yt-users at lists.spacepope.org
>>>> >
>>>> Cc:
>>>> Bcc:
>>>> Date: Wed, 6 Dec 2017 09:08:29 -0500
>>>> Subject: [yt-users] Parallelism in yt Applied to Large Datasets
>>>> Hi yt Folks,
>>>>
>>>> I've written a script that uses a yt DatasetSeries object to analyze a
>>>> time series dataset generated by FLASH. It worked beautifully, until I
>>>> tried to run it on a new cluster with significantly larger HDF5 files (4 GB
>>>> to greater than 8 GB per file). Now, while running the script, the RAM
>>>> usage just grows and grows until the OS kills the job.
>>>>
>>>> It seems to me that I need to use domain decomposition to process these
>>>> large files. So, my question to the group is this: is it possible to use
>>>> both domain decomposition *and* parallel time series processing in a single
>>>> script? This would require that yt be able to subdivide the available MPI
>>>> processors into a number of work groups, each work group handling a single
>>>> input file.
>>>>
>>>> Cheers,
>>>> Jason
>>>>
>>>> ------
>>>> Jason Galyardt
>>>> University of Georgia
>>>>
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Nathan Goldbaum <nathan12343 at gmail.com>
>>>> To: Discussion of the yt analysis package <yt-users at lists.spacepope.org
>>>> >
>>>> Cc:
>>>> Bcc:
>>>> Date: Wed, 06 Dec 2017 14:25:01 +0000
>>>> Subject: Re: [yt-users] Parallelism in yt Applied to Large Datasets
>>>> That depends on what sort of analysis you are doing. Not all tasks in
>>>> yt are parallel-aware.
>>>>
>>>> On Wed, Dec 6, 2017 at 8:08 AM Jason Galyardt <jason.galyardt at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi yt Folks,
>>>>>
>>>>> I've written a script that uses a yt DatasetSeries object to analyze a
>>>>> time series dataset generated by FLASH. It worked beautifully, until I
>>>>> tried to run it on a new cluster with significantly larger HDF5 files (4 GB
>>>>> to greater than 8 GB per file). Now, while running the script, the RAM
>>>>> usage just grows and grows until the OS kills the job.
>>>>>
>>>>> It seems to me that I need to use domain decomposition to process
>>>>> these large files. So, my question to the group is this: is it possible to
>>>>> use both domain decomposition *and* parallel time series processing in a
>>>>> single script? This would require that yt be able to subdivide the
>>>>> available MPI processors into a number of work groups, each work group
>>>>> handling a single input file.
>>>>>
>>>>> Cheers,
>>>>> Jason
>>>>>
>>>>> ------
>>>>> Jason Galyardt
>>>>> University of Georgia
>>>>>
>>>>> _______________________________________________
>>>>> yt-users mailing list
>>>>> yt-users at lists.spacepope.org
>>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>>
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Scott Feister <sfeister at gmail.com>
>>>> To: Discussion of the yt analysis package <yt-users at lists.spacepope.org
>>>> >
>>>> Cc:
>>>> Bcc:
>>>> Date: Wed, 6 Dec 2017 11:30:18 -0800
>>>> Subject: Re: [yt-users] Parallelism in yt Applied to Large Datasets
>>>> Hi Jason,
>>>>
>>>> I don't know how to do both domain and time decomposition in yt, but I
>>>> have been doing time-series analysis in yt of some fairly massive FLASH
>>>> HDF5 outputs (~20 GB each) without a problem. If you'd like to share the
>>>> script with me (you can send to feister at flash.uchicago.edu), I can
>>>> take a look and see if I notice anything particularly wasting RAM. Maybe
>>>> there's a simpler solution than resorting to domain decomposition!
>>>>
>>>> Best,
>>>>
>>>> Scott
>>>>
>>>>
>>>> Scott Feister, Ph.D.
>>>> Postdoctoral Researcher, Flash Center for Computational Science
>>>> University of Chicago, Department of Astronomy and Astrophysics
>>>>
>>>> On Wed, Dec 6, 2017 at 6:25 AM, Nathan Goldbaum <nathan12343 at gmail.com>
>>>> wrote:
>>>>
>>>>> That depends on what sort of analysis you are doing. Not all tasks in
>>>>> yt are parallel-aware.
>>>>>
>>>>> On Wed, Dec 6, 2017 at 8:08 AM Jason Galyardt <
>>>>> jason.galyardt at gmail.com> wrote:
>>>>>
>>>>>> Hi yt Folks,
>>>>>>
>>>>>> I've written a script that uses a yt DatasetSeries object to analyze
>>>>>> a time series dataset generated by FLASH. It worked beautifully, until I
>>>>>> tried to run it on a new cluster with significantly larger HDF5 files (4 GB
>>>>>> to greater than 8 GB per file). Now, while running the script, the RAM
>>>>>> usage just grows and grows until the OS kills the job.
>>>>>>
>>>>>> It seems to me that I need to use domain decomposition to process
>>>>>> these large files. So, my question to the group is this: is it possible to
>>>>>> use both domain decomposition *and* parallel time series processing in a
>>>>>> single script? This would require that yt be able to subdivide the
>>>>>> available MPI processors into a number of work groups, each work group
>>>>>> handling a single input file.
>>>>>>
>>>>>> Cheers,
>>>>>> Jason
>>>>>>
>>>>>> ------
>>>>>> Jason Galyardt
>>>>>> University of Georgia
>>>>>>
>>>>>> _______________________________________________
>>>>>> yt-users mailing list
>>>>>> yt-users at lists.spacepope.org
>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> yt-users mailing list
>>>>> yt-users at lists.spacepope.org
>>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>>
>>>>>
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Britton Smith <brittonsmith at gmail.com>
>>>> To: Discussion of the yt analysis package <yt-users at lists.spacepope.org
>>>> >
>>>> Cc:
>>>> Bcc:
>>>> Date: Wed, 6 Dec 2017 11:45:10 -0800
>>>> Subject: Re: [yt-users] Parallelism in yt Applied to Large Datasets
>>>> Hi Scott,
>>>>
>>>> yt can do the multi-level parallelism you're talking about, i.e.,
>>>> parallelism over multiple datasets and in the operations on a single
>>>> dataset.  I would start by looking here:
>>>> http://yt-project.org/docs/dev/analyzing/parallel_computatio
>>>> n.html#parallelization-over-multiple-objects-and-datasets
>>>>
>>>> Namely, have a look at the user of "piter" when looping over the
>>>> DatasetSeries.  With that function, you can specify the number of jobs (the
>>>> njobs keyword) to be a number less than the total number of processors you
>>>> have available.  This will give you work groups with multiple processors
>>>> for each dataset.  Then, as long as the operations you're trying to do have
>>>> been parallelized, things will just work, i.e., that operation will employ
>>>> all the cores of that work group.
>>>>
>>>> If you need to do some custom parallelization at the dataset level, I
>>>> also suggest having a look at the parallel_objects command:
>>>> http://yt-project.org/docs/dev/analyzing/parallel_computatio
>>>> n.html#parallelizing-over-multiple-objects
>>>>
>>>> This has a similar structure to piter, only it is a more general
>>>> looping construct that allows you to break the iterations of the loops into
>>>> separate processors or workgroups.  parallel_objects is also nestable, so
>>>> you can have nested loops that continually break things down further.
>>>>
>>>> I hope this helps.  Please, feel free to come back if you have more
>>>> specific questions on parallelizing your analysis.
>>>>
>>>> Britton
>>>>
>>>>
>>>>
>>>> On Wed, Dec 6, 2017 at 11:30 AM, Scott Feister <sfeister at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> I don't know how to do both domain and time decomposition in yt, but I
>>>>> have been doing time-series analysis in yt of some fairly massive FLASH
>>>>> HDF5 outputs (~20 GB each) without a problem. If you'd like to share the
>>>>> script with me (you can send to feister at flash.uchicago.edu), I can
>>>>> take a look and see if I notice anything particularly wasting RAM. Maybe
>>>>> there's a simpler solution than resorting to domain decomposition!
>>>>>
>>>>> Best,
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> Scott Feister, Ph.D.
>>>>> Postdoctoral Researcher, Flash Center for Computational Science
>>>>> University of Chicago, Department of Astronomy and Astrophysics
>>>>>
>>>>> On Wed, Dec 6, 2017 at 6:25 AM, Nathan Goldbaum <nathan12343 at gmail.com
>>>>> > wrote:
>>>>>
>>>>>> That depends on what sort of analysis you are doing. Not all tasks in
>>>>>> yt are parallel-aware.
>>>>>>
>>>>>> On Wed, Dec 6, 2017 at 8:08 AM Jason Galyardt <
>>>>>> jason.galyardt at gmail.com> wrote:
>>>>>>
>>>>>>> Hi yt Folks,
>>>>>>>
>>>>>>> I've written a script that uses a yt DatasetSeries object to analyze
>>>>>>> a time series dataset generated by FLASH. It worked beautifully, until I
>>>>>>> tried to run it on a new cluster with significantly larger HDF5 files (4 GB
>>>>>>> to greater than 8 GB per file). Now, while running the script, the RAM
>>>>>>> usage just grows and grows until the OS kills the job.
>>>>>>>
>>>>>>> It seems to me that I need to use domain decomposition to process
>>>>>>> these large files. So, my question to the group is this: is it possible to
>>>>>>> use both domain decomposition *and* parallel time series processing in a
>>>>>>> single script? This would require that yt be able to subdivide the
>>>>>>> available MPI processors into a number of work groups, each work group
>>>>>>> handling a single input file.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jason
>>>>>>>
>>>>>>> ------
>>>>>>> Jason Galyardt
>>>>>>> University of Georgia
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> yt-users mailing list
>>>>>>> yt-users at lists.spacepope.org
>>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> yt-users mailing list
>>>>>> yt-users at lists.spacepope.org
>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> yt-users mailing list
>>>>> yt-users at lists.spacepope.org
>>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> yt-users mailing list
>>>> yt-users at lists.spacepope.org
>>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>>
>>>
>>
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
>>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20171206/d6c3aa73/attachment-0002.html>