<div dir="ltr"><div><div><div><div>Hi all,<br><br></div>I am currently using dataset parallelization, like so:<br><br>ts = yt.load("/path/mydata_hdf5_plt*")<br>results = {}<br>for sto, ds in ts.piter(storage=results):<br></div>   # processing on each dataset using:<br></div><div>   # cut regions</div><div>   # center of mass calculations</div><div>   # weighted averages</div><div>   # projections<br></div><div><br></div>if yt.is_root():<br></div>    # process aggregate results<br><div><div><div><div><div><div><div class="gmail_extra"><br></div><div class="gmail_extra">I'll try Britton's suggestion of specifying the number of processors to use as a subset of the total available. <br></div><div class="gmail_extra"><br></div><div class="gmail_extra">Also, it may or may not be relevant, but the cluster I'm using does not support MPICH2 (installed by conda as a dependency for mpi4py) due to lack of InfiniBand support. I have removed the mpi4py and mpich2 conda packages and reinstalled with the OpenMPI implementation:</div><div class="gmail_extra"><br></div><div class="gmail_extra">conda remove mpi4py</div><div class="gmail_extra">conda remove mpich2</div><div class="gmail_extra">conda install -c mpi4py mpi4py</div><div class="gmail_extra"><br></div><div class="gmail_extra">To check that openmpi is now installed, you can do the following (an asterisk will appear next to any installed packages):</div><div class="gmail_extra"><br></div><div class="gmail_extra">conda search -c mpi4py mpi4py</div><div class="gmail_extra">conda search -c mpi4py openmpi</div><div class="gmail_extra"><br></div><div class="gmail_extra">On my cluster, these commands show that I have the following packages installed:</div><div class="gmail_extra"><br></div><div class="gmail_extra">mpi4py<br></div><div class="gmail_extra">*  2.0.0            py27_openmpi_2  mpi4py</div><div class="gmail_extra"><br></div><div class="gmail_extra">openmpi</div><div class="gmail_extra">*  1.10.2                        1  mpi4py </div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra">Is it possible that this is causing my RAM consumption problems?</div><div class="gmail_extra"><br></div><div class="gmail_extra">Thanks for the help,</div><div class="gmail_extra">Jason</div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 6, 2017 at 2:45 PM,  <span dir="ltr"><<a href="mailto:yt-users-request@lists.spacepope.org" target="_blank">yt-users-request@lists.spacepope.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Send yt-users mailing list submissions to<br>

        <a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org" rel="noreferrer" target="_blank">http://lists.spacepope.org/<wbr>listinfo.cgi/yt-users-<wbr>spacepope.org</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:yt-users-request@lists.spacepope.org">yt-users-request@lists.<wbr>spacepope.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:yt-users-owner@lists.spacepope.org">yt-users-owner@lists.<wbr>spacepope.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of yt-users digest..."<br>

<br>Today's Topics:<br>

<br>

   1. Parallelism in yt Applied to Large Datasets (Jason Galyardt)<br>

   2. Re: Parallelism in yt Applied to Large Datasets (Nathan Goldbaum)<br>

   3. Re: Parallelism in yt Applied to Large Datasets (Scott Feister)<br>

   4. Re: Parallelism in yt Applied to Large Datasets (Britton Smith)<br>

<br><br>---------- Forwarded message ----------<br>From: Jason Galyardt <<a href="mailto:jason.galyardt@gmail.com">jason.galyardt@gmail.com</a>><br>To: Discussion of the yt analysis package <<a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a>><br>Cc: <br>Bcc: <br>Date: Wed, 6 Dec 2017 09:08:29 -0500<br>Subject: [yt-users] Parallelism in yt Applied to Large Datasets<br><div dir="ltr"><div><div><div><div><div>Hi yt Folks,<br><br></div>I've written a script that uses a yt DatasetSeries object to analyze a time series dataset generated by FLASH. It worked beautifully, until I tried to run it on a new cluster with significantly larger HDF5 files (4 GB to greater than 8 GB per file). Now, while running the script, the RAM usage just grows and grows until the OS kills the job. <br><br>It seems to me that I need to use domain decomposition to process these large files. So, my question to the group is this: is it possible to use both domain decomposition *and* parallel time series processing in a single script? This would require that yt be able to subdivide the available MPI processors into a number of work groups, each work group handling a single input file.<br><br></div>Cheers,<br></div>Jason<br><br>------<br></div>Jason Galyardt<br></div>University of Georgia<br><br></div>

<br><br>---------- Forwarded message ----------<br>From: Nathan Goldbaum <<a href="mailto:nathan12343@gmail.com">nathan12343@gmail.com</a>><br>To: Discussion of the yt analysis package <<a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a>><br>Cc: <br>Bcc: <br>Date: Wed, 06 Dec 2017 14:25:01 +0000<br>Subject: Re: [yt-users] Parallelism in yt Applied to Large Datasets<br><div><div dir="auto">That depends on what sort of analysis you are doing. Not all tasks in yt are parallel-aware.</div><br><div class="gmail_quote"><div>On Wed, Dec 6, 2017 at 8:08 AM Jason Galyardt <<a href="mailto:jason.galyardt@gmail.com" target="_blank">jason.galyardt@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div><div><div><div>Hi yt Folks,<br><br></div>I've written a script that uses a yt DatasetSeries object to analyze a time series dataset generated by FLASH. It worked beautifully, until I tried to run it on a new cluster with significantly larger HDF5 files (4 GB to greater than 8 GB per file). Now, while running the script, the RAM usage just grows and grows until the OS kills the job. <br><br>It seems to me that I need to use domain decomposition to process these large files. So, my question to the group is this: is it possible to use both domain decomposition *and* parallel time series processing in a single script? This would require that yt be able to subdivide the available MPI processors into a number of work groups, each work group handling a single input file.<br><br></div>Cheers,<br></div>Jason<br><br>------<br></div>Jason Galyardt<br></div>University of Georgia<br><br></div>

______________________________<wbr>_________________<br>

yt-users mailing list<br>

<a href="mailto:yt-users@lists.spacepope.org" target="_blank">yt-users@lists.spacepope.org</a><br>

<a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org" rel="noreferrer" target="_blank">http://lists.spacepope.org/<wbr>listinfo.cgi/yt-users-<wbr>spacepope.org</a><br>

</blockquote></div></div>

<br><br>---------- Forwarded message ----------<br>From: Scott Feister <<a href="mailto:sfeister@gmail.com">sfeister@gmail.com</a>><br>To: Discussion of the yt analysis package <<a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a>><br>Cc: <br>Bcc: <br>Date: Wed, 6 Dec 2017 11:30:18 -0800<br>Subject: Re: [yt-users] Parallelism in yt Applied to Large Datasets<br><div dir="ltr"><div><div><div>Hi Jason,<br><br></div>I don't know how to do both domain and time decomposition in yt, but I have been doing time-series analysis in yt of some fairly massive FLASH 

HDF5 outputs (~20 GB each) without a problem. If you'd like to share the script with me (you can send to <a href="mailto:feister@flash.uchicago.edu" target="_blank">feister@flash.uchicago.edu</a>), I can take a look and see if I notice anything particularly wasting RAM. Maybe there's a simpler solution than resorting to domain decomposition!<br><br></div>Best,<br><br></div>Scott<br></div><div class="gmail_extra"><br clear="all"><div><div class="gmail-m_3381639536185017290gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Scott Feister, Ph.D.<br>Postdoctoral Researcher, Flash Center for Computational Science<br></div><div>University of Chicago, Department of Astronomy and Astrophysics<br></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>

<br><div class="gmail_quote">On Wed, Dec 6, 2017 at 6:25 AM, Nathan Goldbaum <span dir="ltr"><<a href="mailto:nathan12343@gmail.com" target="_blank">nathan12343@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div dir="auto">That depends on what sort of analysis you are doing. Not all tasks in yt are parallel-aware.</div><br><div class="gmail_quote"><div><div class="gmail-m_3381639536185017290h5"><div>On Wed, Dec 6, 2017 at 8:08 AM Jason Galyardt <<a href="mailto:jason.galyardt@gmail.com" target="_blank">jason.galyardt@gmail.com</a>> wrote:<br></div></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-m_3381639536185017290h5"><div><div><div><div><div><div>Hi yt Folks,<br><br></div>I've written a script that uses a yt DatasetSeries object to analyze a time series dataset generated by FLASH. It worked beautifully, until I tried to run it on a new cluster with significantly larger HDF5 files (4 GB to greater than 8 GB per file). Now, while running the script, the RAM usage just grows and grows until the OS kills the job. <br><br>It seems to me that I need to use domain decomposition to process these large files. So, my question to the group is this: is it possible to use both domain decomposition *and* parallel time series processing in a single script? This would require that yt be able to subdivide the available MPI processors into a number of work groups, each work group handling a single input file.<br><br></div>Cheers,<br></div>Jason<br><br>------<br></div>Jason Galyardt<br></div>University of Georgia<br><br></div></div></div>

______________________________<wbr>_________________<br>

yt-users mailing list<br>

<a href="mailto:yt-users@lists.spacepope.org" target="_blank">yt-users@lists.spacepope.org</a><br>

<a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org" rel="noreferrer" target="_blank">http://lists.spacepope.org/lis<wbr>tinfo.cgi/yt-users-spacepope.<wbr>org</a><br>

</blockquote></div></div>

<br>______________________________<wbr>_________________<br>

yt-users mailing list<br>

<a href="mailto:yt-users@lists.spacepope.org" target="_blank">yt-users@lists.spacepope.org</a><br>

<a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org" rel="noreferrer" target="_blank">http://lists.spacepope.org/lis<wbr>tinfo.cgi/yt-users-spacepope.<wbr>org</a><br>

<br></blockquote></div><br></div>

<br><br>---------- Forwarded message ----------<br>From: Britton Smith <<a href="mailto:brittonsmith@gmail.com">brittonsmith@gmail.com</a>><br>To: Discussion of the yt analysis package <<a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a>><br>Cc: <br>Bcc: <br>Date: Wed, 6 Dec 2017 11:45:10 -0800<br>Subject: Re: [yt-users] Parallelism in yt Applied to Large Datasets<br><div dir="ltr">Hi Scott,<div><br></div><div>yt can do the multi-level parallelism you're talking about, i.e., parallelism over multiple datasets and in the operations on a single dataset.  I would start by looking here:</div><div><a href="http://yt-project.org/docs/dev/analyzing/parallel_computation.html#parallelization-over-multiple-objects-and-datasets" target="_blank">http://yt-project.org/docs/<wbr>dev/analyzing/parallel_<wbr>computation.html#<wbr>parallelization-over-multiple-<wbr>objects-and-datasets</a><br></div><div><br></div><div>Namely, have a look at the user of "piter" when looping over the DatasetSeries.  With that function, you can specify the number of jobs (the njobs keyword) to be a number less than the total number of processors you have available.  This will give you work groups with multiple processors for each dataset.  Then, as long as the operations you're trying to do have been parallelized, things will just work, i.e., that operation will employ all the cores of that work group.</div><div><br></div><div>If you need to do some custom parallelization at the dataset level, I also suggest having a look at the parallel_objects command:</div><div><a href="http://yt-project.org/docs/dev/analyzing/parallel_computation.html#parallelizing-over-multiple-objects" target="_blank">http://yt-project.org/docs/<wbr>dev/analyzing/parallel_<wbr>computation.html#<wbr>parallelizing-over-multiple-<wbr>objects</a><br></div><div><br></div><div>This has a similar structure to piter, only it is a more general looping construct that allows you to break the iterations of the loops into separate processors or workgroups.  parallel_objects is also nestable, so you can have nested loops that continually break things down further.</div><div><br></div><div>I hope this helps.  Please, feel free to come back if you have more specific questions on parallelizing your analysis.</div><div><br></div><div>Britton</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 6, 2017 at 11:30 AM, Scott Feister <span dir="ltr"><<a href="mailto:sfeister@gmail.com" target="_blank">sfeister@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div>Hi Jason,<br><br></div>I don't know how to do both domain and time decomposition in yt, but I have been doing time-series analysis in yt of some fairly massive FLASH 

HDF5 outputs (~20 GB each) without a problem. If you'd like to share the script with me (you can send to <a href="mailto:feister@flash.uchicago.edu" target="_blank">feister@flash.uchicago.edu</a>), I can take a look and see if I notice anything particularly wasting RAM. Maybe there's a simpler solution than resorting to domain decomposition!<br><br></div>Best,<br><br></div>Scott<br></div><div class="gmail_extra"><br clear="all"><div><div class="gmail-m_-2438002274411766529m_-185126254019634341gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Scott Feister, Ph.D.<br>Postdoctoral Researcher, Flash Center for Computational Science<br></div><div>University of Chicago, Department of Astronomy and Astrophysics<br></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div><div><div class="gmail-m_-2438002274411766529h5">

<br><div class="gmail_quote">On Wed, Dec 6, 2017 at 6:25 AM, Nathan Goldbaum <span dir="ltr"><<a href="mailto:nathan12343@gmail.com" target="_blank">nathan12343@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div dir="auto">That depends on what sort of analysis you are doing. Not all tasks in yt are parallel-aware.</div><br><div class="gmail_quote"><div><div class="gmail-m_-2438002274411766529m_-185126254019634341h5"><div>On Wed, Dec 6, 2017 at 8:08 AM Jason Galyardt <<a href="mailto:jason.galyardt@gmail.com" target="_blank">jason.galyardt@gmail.com</a>> wrote:<br></div></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-m_-2438002274411766529m_-185126254019634341h5"><div><div><div><div><div><div>Hi yt Folks,<br><br></div>I've written a script that uses a yt DatasetSeries object to analyze a time series dataset generated by FLASH. It worked beautifully, until I tried to run it on a new cluster with significantly larger HDF5 files (4 GB to greater than 8 GB per file). Now, while running the script, the RAM usage just grows and grows until the OS kills the job. <br><br>It seems to me that I need to use domain decomposition to process these large files. So, my question to the group is this: is it possible to use both domain decomposition *and* parallel time series processing in a single script? This would require that yt be able to subdivide the available MPI processors into a number of work groups, each work group handling a single input file.<br><br></div>Cheers,<br></div>Jason<br><br>------<br></div>Jason Galyardt<br></div>University of Georgia<br><br></div></div></div>

______________________________<wbr>_________________<br>

yt-users mailing list<br>

<a href="mailto:yt-users@lists.spacepope.org" target="_blank">yt-users@lists.spacepope.org</a><br>

<a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org" rel="noreferrer" target="_blank">http://lists.spacepope.org/lis<wbr>tinfo.cgi/yt-users-spacepope.o<wbr>rg</a><br>

</blockquote></div></div>

<br>______________________________<wbr>_________________<br>

yt-users mailing list<br>

<a href="mailto:yt-users@lists.spacepope.org" target="_blank">yt-users@lists.spacepope.org</a><br>

<a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org" rel="noreferrer" target="_blank">http://lists.spacepope.org/lis<wbr>tinfo.cgi/yt-users-spacepope.o<wbr>rg</a><br>

<br></blockquote></div><br></div></div></div>

<br>______________________________<wbr>_________________<br>

yt-users mailing list<br>

<a href="mailto:yt-users@lists.spacepope.org" target="_blank">yt-users@lists.spacepope.org</a><br>

<a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org" rel="noreferrer" target="_blank">http://lists.spacepope.org/lis<wbr>tinfo.cgi/yt-users-spacepope.<wbr>org</a><br>

<br></blockquote></div><br></div>

<br>______________________________<wbr>_________________<br>

yt-users mailing list<br>

<a href="mailto:yt-users@lists.spacepope.org">yt-users@lists.spacepope.org</a><br>

<a href="http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org" rel="noreferrer" target="_blank">http://lists.spacepope.org/<wbr>listinfo.cgi/yt-users-<wbr>spacepope.org</a><br>

<br></blockquote></div><br></div></div></div></div></div></div></div></div>