[yt-svn] commit/yt-doc: 3 new changesets

Bitbucket commits-noreply at bitbucket.org
Mon Apr 16 13:57:33 PDT 2012


3 new commits in yt-doc:


https://bitbucket.org/yt_analysis/yt-doc/changeset/6bd720248b01/
changeset:   6bd720248b01
user:        jwise77
date:        2012-03-16 14:21:16
summary:     Updating example to reflect API change.
affected #:  1 file

diff -r 45a9e83d961a1addf095dc5abaadcc8a92962338 -r 6bd720248b01353f932094cb45d2b351991fb5d7 source/analysis_modules/halo_profiling.rst
--- a/source/analysis_modules/halo_profiling.rst
+++ b/source/analysis_modules/halo_profiling.rst
@@ -25,7 +25,7 @@
 .. code-block:: python
 
   import yt.analysis_modules.halo_profiler.api as HP
-  hp = HP.halo_profiler("DD0242/DD0242")
+  hp = HP.HaloProfiler("DD0242/DD0242")
 
 Most of the halo profiler's options are configured with keyword arguments given at 
 instantiation.  These options are:



https://bitbucket.org/yt_analysis/yt-doc/changeset/d81f7ebfcc84/
changeset:   d81f7ebfcc84
user:        jwise77
date:        2012-04-16 22:37:33
summary:     Merging
affected #:  5 files

diff -r 6bd720248b01353f932094cb45d2b351991fb5d7 -r d81f7ebfcc84073348af5861c1e7ce962279c692 source/advanced/parallel_computation.rst
--- a/source/advanced/parallel_computation.rst
+++ b/source/advanced/parallel_computation.rst
@@ -197,15 +197,60 @@
 This example above can be modified to loop over anything that can be saved to
 a Python list: halos, data files, arrays, and more.
 
+Parallel Time Series Analysis
+-----------------------------
+
+The same :func:`parallel_objects` machinery discussed above is turned on by
+default when using a ``TimeSeries`` object (see :ref:`time-series-analysis`) to
+iterate over simulation outputs.  The syntax for this is very simple.  As an
+example, we can use the following script to find the angular momentum vector in
+a 1 pc sphere centered on the maximum density cell in a large number of
+simulation outputs:
+
+.. code-block:: python
+
+   from yt.mods import *
+   all_files = glob.glob("DD*/output_*")
+   all_files.sort()
+   ts = TimeSeries.from_filenames(all_files, Parallel = True)
+   sphere = ts.sphere("max", (1.0, "pc))
+   L_vecs = sphere.quantities["AngularMomentumVector"]()
+
+Note that this script can be run in serial or parallel with an arbitrary number
+of processors.  When running in parallel, each output is given to a different
+processor.  By default, Parallel is set to ``True``, so you do not have to
+explicitly set ``Parallel = True`` as in the above example. 
+
+You can also request a fixed number of processors to calculate each
+angular momenum vector.  For example, this script will calculate each angular
+momentum vector using a workgroup of four processors.
+
+.. code-block:: python
+
+   from yt.mods import *
+   all_files = glob.glob("DD*/output_*")
+   all_files.sort()
+   ts = TimeSeries.from_filenames(all_files, Parallel = 4)
+   sphere = ts.sphere("max", (1.0, "pc))
+   L_vecs = sphere.quantities["AngularMomentumVector"]()
+
+If you do not want to use ``parallel_objects`` parallelism when using a
+TimeSeries object, set ``Parallel = False``.  When running python in parallel,
+this will use all of the available processors to evaluate the requested
+operation on each simulation output.  Some care and possibly trial and error
+might be necessary to estimate the correct settings for your Simulation
+outputs.
+   
+
 Parallel Performance, Resources, and Tuning
 -------------------------------------------
 
-Optimizing parallel jobs in YT is difficult; there are many parameters
-that affect how well and quickly the job runs.
-In many cases, the only way to find out what the minimum (or optimal)
-number of processors is, or amount of memory needed, is through trial and error.
-However, this section will attempt to provide some insight into what are good
-starting values for a given parallel task.
+Optimizing parallel jobs in YT is difficult; there are many parameters that
+affect how well and quickly the job runs.  In many cases, the only way to find
+out what the minimum (or optimal) number of processors is, or amount of memory
+needed, is through trial and error.  However, this section will attempt to
+provide some insight into what are good starting values for a given parallel
+task.
 
 Grid Decomposition
 ++++++++++++++++++


diff -r 6bd720248b01353f932094cb45d2b351991fb5d7 -r d81f7ebfcc84073348af5861c1e7ce962279c692 source/analysis_modules/merger_tree.rst
--- a/source/analysis_modules/merger_tree.rst
+++ b/source/analysis_modules/merger_tree.rst
@@ -31,6 +31,8 @@
 Clearly, another requirement is that Python has the
 `sqlite3 library <http://docs.python.org/library/sqlite3.html>`_
 installed.
+This should be built along with everything else yt needs
+if the ``install_script.sh`` was used.
 
 The merger tree can be calculated in parallel, and if necessary, it will run
 the halo finding in parallel as well. Please see the note below about the
@@ -77,18 +79,19 @@
 at the same time (`see more here <http://www.sqlite.org/lockingv3.html#how_to_corrupt>`_).
 NFS disks can store files on multiple physical hard drives, and it can take time
 for changes made by one task to appear to all the parallel tasks.
+Only one task of the merger tree ever interacts with the database,
+so these dangers are minimal,
+but in general it's a good idea to know something about the disk used to
+store the database.
 
-The Merger Tree takes extra caution to ensure that every task sees the exact
-same version of the database before writing to it, and only one task
-ever writes to the database at a time.
-This is accomplished by using MPI Barriers and md5 hashing of the database
-between writes.
 In general, it is recommended to keep the database on a 'real disk' 
-(/tmp for example, if all the tasks are on the same SMP node) if possible,
+(/tmp for example, if all the tasks are on the same SMP node,
+or RAM disk for extra speed) if possible,
 but it should work on a NFS disk as well.
-If the database must be stored on a NFS disk, the documentation for the NFS protocol
-should be consulted to see what settings are available that can minimize the potential for
-file replication problems of the database.
+If a temporary disk is used to store the database while it's being built,
+remember to copy the file to a permanent disk after the merger tree script
+is finished.
+
 
 Running and Using the Halo Merger Tree
 --------------------------------------
@@ -155,16 +158,18 @@
 If the halos are to be found during the course of building the merger tree,
 run with an appropriate number of tasks to the size of the dataset and the
 halo finder used.
-The merger tree itself, which compares halo membership in parallel very effectively,
-is almost completely constrained by the
-read/write times of the SQLite file.
+The speed of the merger tree itself,
+which compares halo membership in parallel very effectively,
+is almost completely constrained by the read/write times of the SQLite file.
 In tests with the halos pre-located, there is not much speedup beyond two MPI tasks.
 There is no negative effect with running the merger tree with more tasks (which is
 why if halos are to be found by the merger tree, the merger tree should be
-run with as many tasks as that step requires), but there is no benefit.
+run with as many tasks as that step requires), and indeed if the simulation
+is a large one, running in parallel does provide memory parallelism,
+which is important.
 
-How The Database Is Handled
----------------------------
+How The Database Is Handled In Analysis Restarts
+------------------------------------------------
 
 The Merger Tree is designed to allow the merger tree database to be built
 incrementally.
@@ -178,6 +183,12 @@
 referencing the same database as before.
 By referencing the same database as before, work does not need to be repeated.
 
+If the merger tree process is interrupted before completion (say, if the 
+jobs walltime is exceeded and the scheduler kills it), just run the exact
+same job again.
+The merger tree will check to see what work has already been completed, and
+resume where it left off.
+
 Additional Parameters
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -197,10 +208,6 @@
     rebuild the database regardless of whether or not the halo files or
     database exist on disk already.
     Default: False.
-  * ``sleep`` (float) - The amount of time in seconds tasks waits between
-    checks to make sure the SQLite database file is globally-identical.
-    This time is used to allow a parallel file system to synch up globally.
-    The value may not be negative or zero. Default: 1.
   * ``index`` (bool) - Whether to add an index to the SQLite file. True makes
     SQL searches faster at the cost of additional disk space. Default=True.
 


diff -r 6bd720248b01353f932094cb45d2b351991fb5d7 -r d81f7ebfcc84073348af5861c1e7ce962279c692 source/analyzing/time_series_analysis.rst
--- a/source/analyzing/time_series_analysis.rst
+++ b/source/analyzing/time_series_analysis.rst
@@ -20,8 +20,13 @@
 But this is not really very nice.  This ends up requiring a lot of maintenance.
 The :class:`~yt.data_objects.time_series.TimeSeriesData` object has been
 designed to remove some of this clunkiness and present an easier, more unified
-approach to analyzing sets of data.  Furthermore, future versions of yt will
-automatically parallelize operations conducted on time series of data.
+approach to analyzing sets of data.  Even better,
+:class:`~yt.data_objects.time_series.TimeSeriesData` works in parallel by
+default (see :ref:`parallel-computation`), so you can use a ``TimeSeriesData``
+object to quickly and easily parallelize your analysis.  Since doing the same
+analysis task on many simulation outputs is 'embarrasingly' parallel, this
+naturally allows for almost arbitrary speedup - limited only by the number of
+available processors and the number of simulation outputs.
 
 The idea behind the current implementation of time series analysis is that
 the underlying data and the operators that act on that data can and should be
@@ -117,5 +122,5 @@
    print ms
 
 This allows you to create your own analysis tasks that will be then available
-to time series data objects.  In the future, this will allow for transparent
-parallelization.
+to time series data objects.  Since ``TimeSeriesData`` objects iterate over
+filenames in parallel by default, this allows for transparent parallelization. 


diff -r 6bd720248b01353f932094cb45d2b351991fb5d7 -r d81f7ebfcc84073348af5861c1e7ce962279c692 source/reference/api/extension_types.rst
--- a/source/reference/api/extension_types.rst
+++ b/source/reference/api/extension_types.rst
@@ -90,6 +90,7 @@
    ~yt.visualization.image_writer.map_to_colors
    ~yt.visualization.image_writer.strip_colormap_data
    ~yt.visualization.image_writer.splat_points
+   ~yt.visualization.image_writer.annotate_image
 
 We also provide a module that is very good for generating EPS figures,
 particularly with complicated layouts.





https://bitbucket.org/yt_analysis/yt-doc/changeset/8ba1a20b12e2/
changeset:   8ba1a20b12e2
user:        jwise77
date:        2012-04-16 22:54:43
summary:     Adding multiplot(_yt), single_plot, and return_cmap to eps_writer
docstrings.
affected #:  1 file

diff -r d81f7ebfcc84073348af5861c1e7ce962279c692 -r 8ba1a20b12e2966f9e0c7d438ee9e7eac0b7b1c8 source/reference/api/extension_types.rst
--- a/source/reference/api/extension_types.rst
+++ b/source/reference/api/extension_types.rst
@@ -99,6 +99,10 @@
    :toctree: generated/
 
    ~yt.visualization.eps_writer.DualEPS
+   ~yt.visualization.eps_writer.single_plot
+   ~yt.visualization.eps_writer.multiplot
+   ~yt.visualization.eps_writer.multiplot_yt
+   ~yt.visualization.eps_writer.return_cmap
 
 .. _image-panner-api:

Repository URL: https://bitbucket.org/yt_analysis/yt-doc/

--

This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.



More information about the yt-svn mailing list