[yt-svn] commit/yt: 5 new changesets

Sat Jul 19 16:06:31 PDT 2014

5 new commits in yt:

https://bitbucket.org/yt_analysis/yt/commits/e320b4a52a26/
Changeset:   e320b4a52a26
Branch:      yt-3.0
User:        ngoldbaum
Date:        2014-07-18 23:06:54
Summary:     Updating the parallelism docs.
Affected #:  3 files

diff -r 894d44f837d4d259550bb7b741091424226375b7 -r e320b4a52a2628c9d437f3221cfcb32bc1abf507 doc/source/analyzing/ionization_cube.py

--- a/doc/source/analyzing/ionization_cube.py
+++ b/doc/source/analyzing/ionization_cube.py
@@ -1,21 +1,24 @@
-from yt.mods import *
+import yt
 from yt.utilities.parallel_tools.parallel_analysis_interface \
     import communication_system
-import h5py, glob, time
 
- at derived_field(name = "IonizedHydrogen",
-               units = r"\frac{\rho_{HII}}{rho_H}")
+import h5py
+import time
+import numpy as np
+
+ at yt.derived_field(name="IonizedHydrogen", units="",
+                  display_name=r"\frac{\rho_{HII}}{\rho_H}")
 def IonizedHydrogen(field, data):
     return data["HII_Density"]/(data["HI_Density"]+data["HII_Density"])
 
-ts = DatasetSeries.from_filenames("SED800/DD*/*.index", parallel = 8)
+ts = yt.DatasetSeries("SED800/DD*/*.index", parallel=8)
 
 ionized_z = np.zeros(ts[0].domain_dimensions, dtype="float32")
 
 t1 = time.time()
-for pf in ts.piter():
-    z = pf.current_redshift
-    for g in parallel_objects(pf.index.grids, njobs = 16):
+for ds in ts.piter():
+    z = ds.current_redshift
+    for g in parallel_objects(ds.index.grids, njobs = 16):
         i1, j1, k1 = g.get_global_startindex() # Index into our domain
         i2, j2, k2 = g.get_global_startindex() + g.ActiveDimensions
         # Look for the newly ionized gas

diff -r 894d44f837d4d259550bb7b741091424226375b7 -r e320b4a52a2628c9d437f3221cfcb32bc1abf507 doc/source/analyzing/parallel_computation.rst
--- a/doc/source/analyzing/parallel_computation.rst
+++ b/doc/source/analyzing/parallel_computation.rst
@@ -22,14 +22,11 @@
    :ref:`derived-quantities`)
  * 1-, 2-, and 3-D profiles (:ref:`generating-profiles-and-histograms`)
  * Halo finding (:ref:`halo_finding`)
- * Merger tree (:ref:`merger_tree`)
- * Two point functions (:ref:`two_point_functions`)
  * Volume rendering (:ref:`volume_rendering`)
- * Radial column density (:ref:`radial-column-density`)
  * Isocontours & flux calculations (:ref:`extracting-isocontour-information`)
 
 This list covers just about every action ``yt`` can take!  Additionally, almost all
-scripts will benefit from parallelization without any modification.  The goal
+scripts will benefit from parallelization with minimal modification.  The goal
 of Parallel-``yt`` has been to retain API compatibility and abstract all
 parallelism.
 
@@ -45,14 +42,15 @@
 
     $ pip install mpi4py
 
-Once that has been installed, you're all done!  You just need to launch your 
-scripts with ``mpirun`` (or equivalent) and signal to ``yt`` that you want to run 
-them in parallel.  In general, that's all it takes to get a speed benefit on a 
+Once that has been installed, you're all done!  You just need to launch your
+scripts with ``mpirun`` (or equivalent) and signal to ``yt`` that you want to
+run them in parallel by invoking the ``yt.enable_parallelism()`` function in
+your script.  In general, that's all it takes to get a speed benefit on a
 multi-core machine.  Here is an example on an 8-core desktop:
 
 .. code-block:: bash
 
-    $ mpirun -np 8 python script.py --parallel
+    $ mpirun -np 8 python script.py
 
 Throughout its normal operation, ``yt`` keeps you aware of what is happening with
 regular messages to the stderr usually prefaced with: 
@@ -71,10 +69,9 @@
 
 in the case of two cores being used.
 
-It's important to note that all of the processes listed in `capabilities` work
--- and no additional work is necessary to parallelize those processes.
-Furthermore, the ``yt`` command itself recognizes the ``--parallel`` option, so
-those commands will work in parallel as well.
+It's important to note that all of the processes listed in :ref:`capabilities`
+work in parallel -- and no additional work is necessary to parallelize those
+processes.
 
 Running a ``yt`` script in parallel
 -----------------------------------
@@ -85,11 +82,12 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
-   pf = load("RD0035/RedshiftOutput0035")
-   v, c = pf.h.find_max("density")
+   import yt
+   yt.enable_parallelism()
+   ds = yt.load("RD0035/RedshiftOutput0035")
+   v, c = ds.find_max("density")
    print v, c
-   p = ProjectionPlot(pf, "x", "density")
+   p = yt.ProjectionPlot(ds, "x", "density")
    p.save()
 
 If this script is run in parallel, two of the most expensive operations -
@@ -99,7 +97,7 @@
 
 .. code-block:: bash
 
-   $ mpirun -np 16 python2.7 my_script.py --parallel
+   $ mpirun -np 16 python2.7 my_script.py
 
 .. note::
 
@@ -126,11 +124,11 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
-   pf = load("RD0035/RedshiftOutput0035")
-   v, c = pf.h.find_max("density")
-   p = ProjectionPlot(pf, "x", "density")
-   if is_root():
+   import yt
+   ds = yt.load("RD0035/RedshiftOutput0035")
+   v, c = ds.find_max("density")
+   p = yt.ProjectionPlot(ds, "x", "density")
+   if yt.is_root():
        print v, c
        p.save()
 
@@ -144,17 +142,17 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
+   import yt
 
    def print_and_save_plot(v, c, plot, print=True):
        if print:
           print v, c
        plot.save()
 
-   pf = load("RD0035/RedshiftOutput0035")
-   v, c = pf.h.find_max("density")
-   p = ProjectionPlot(pf, "x", "density")
-   only_on_root(print_and_save_plot, v, c, plot, print=True)
+   ds = yt.load("RD0035/RedshiftOutput0035")
+   v, c = ds.find_max("density")
+   p = yt.ProjectionPlot(ds, "x", "density")
+   yt.only_on_root(print_and_save_plot, v, c, plot, print=True)
 
 Types of Parallelism
 --------------------
@@ -174,19 +172,17 @@
 The following operations use spatial decomposition:
 
   * Halo finding
-  * Merger tree
-  * Two point functions
   * Volume rendering
-  * Radial column density
 
 Grid Decomposition
 ++++++++++++++++++
 
-The alternative to spatial decomposition is a simple round-robin of the grids.
-This process allows ``yt`` to pool data access to a given Enzo data file, which
-ultimately results in faster read times and better parallelism.
+The alternative to spatial decomposition is a simple round-robin of data chunks,
+which could be grids, octs, or whatever chunking mechanism is used by the code
+frontend begin used.  This process allows ``yt`` to pool data access to a given
+data file, which ultimately results in faster read times and better parallelism.
 
-The following operations use grid decomposition:
+The following operations use chunk decomposition:
 
   * Projections
   * Slices
@@ -211,16 +207,15 @@
 ---------------------------
 
 It is easy within ``yt`` to parallelize a list of tasks, as long as those tasks
-are independent of one another.
-Using object-based parallelism, the function :func:`parallel_objects` will
-automatically split up a list of tasks over the specified number of processors
-(or cores).
-Please see this heavily-commented example:
+are independent of one another. Using object-based parallelism, the function
+:func:`~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_objects`
+will automatically split up a list of tasks over the specified number of
+processors (or cores).  Please see this heavily-commented example:
 
 .. code-block:: python
    
    # As always...
-   from yt.mods import *
+   import yt
    
    import glob
    
@@ -249,19 +244,19 @@
    # If data does not need to be combined after the loop is done, the line
    # would look like:
    #       for fn in parallel_objects(fns, num_procs):
-   for sto, fn in parallel_objects(fns, num_procs, storage = my_storage):
+   for sto, fn in yt.parallel_objects(fns, num_procs, storage = my_storage):
 
        # Open a data file, remembering that fn is different on each task.
-       pf = load(fn)
-       dd = pf.h.all_data()
+       ds = yt.load(fn)
+       dd = ds.all_data()
 
        # This copies fn and the min/max of density to the local copy of
        # my_storage
        sto.result_id = fn
-       sto.result = dd.quantities["Extrema"]("density")
+       sto.result = dd.quantities.extrema("density")
 
        # Makes and saves a plot of the gas density.
-       p = ProjectionPlot(pf, "x", "density")
+       p = yt.ProjectionPlot(ds, "x", "density")
        p.save()
 
    # At this point, as the loop exits, the local copies of my_storage are
@@ -270,7 +265,7 @@
    # tasks have produced.
    # Below, the values in my_storage are printed by only one task. The other
    # tasks do nothing.
-   if is_root()
+   if yt.is_root()
        for fn, vals in sorted(my_storage.items()):
            print fn, vals
 
@@ -282,43 +277,33 @@
 Parallel Time Series Analysis
 -----------------------------
 
-The same :func:`parallel_objects` machinery discussed above is turned on by
-default when using a ``DatasetSeries`` object (see :ref:`time-series-analysis`)
-to iterate over simulation outputs.  The syntax for this is very simple.  As an
-example, we can use the following script to find the angular momentum vector in
-a 1 pc sphere centered on the maximum density cell in a large number of
-simulation outputs:
+The same ``parallel_objects`` machinery discussed above is turned on by
+default when using a :class:`~yt.data_objects.time_series.DatasetSeries` object
+(see :ref:`time-series-analysis`) to iterate over simulation outputs.  The
+syntax for this is very simple.  As an example, we can use the following script
+to find the angular momentum vector in a 1 pc sphere centered on the maximum
+density cell in a large number of simulation outputs:
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ts = DatasetSeries.from_filenames("DD*/output_*", parallel = True)
-   sphere = ts.sphere("max", (1.0, "pc"))
-   L_vecs = sphere.quantities["AngularMomentumVector"]()
+   import yt
+   yt.enable_parallelism()
+
+   ts = yt.load("DD*/output_*")
+
+   storage = {}
+
+   for sto, ds in ts.piter():
+       sphere = ds.sphere("max", (1.0, "pc"))
+       sto.result = sphere.quantities.angular_momentum_vector()
+       sto.result_id = str(ds)
+
+   for L in sorted(storage.items()):
+       print L
 
 Note that this script can be run in serial or parallel with an arbitrary number
 of processors.  When running in parallel, each output is given to a different
-processor.  By default, parallel is set to ``True``, so you do not have to
-explicitly set ``parallel = True`` as in the above example. 
-
-One could get the same effect by iterating over the individual parameter files
-in the DatasetSeries object:
-
-.. code-block:: python
-
-   from yt.pmods import *
-   ts = DatasetSeries.from_filenames("DD*/output_*", parallel = True)
-   my_storage = {}
-   for sto,pf in ts.piter(storage=my_storage):
-       sphere = pf.sphere("max", (1.0, "pc"))
-       L_vec = sphere.quantities["AngularMomentumVector"]()
-       sto.result_id = pf.parameter_filename
-       sto.result = L_vec
-
-   L_vecs = []
-   for fn, L_vec in sorted(my_storage.items()):
-       L_vecs.append(L_vec)
-
+processor.
 
 You can also request a fixed number of processors to calculate each
 angular momentum vector.  For example, this script will calculate each angular
@@ -328,16 +313,18 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ts = DatasetSeries.from_filenames("DD*/output_*", parallel = 4)
-   sphere = ts.sphere("max", (1.0, "pc))
-   L_vecs = sphere.quantities["AngularMomentumVector"]()
+   import yt
+   ts = yt.DatasetSeries("DD*/output_*", parallel = 4)
+   
+   for ds in ts.piter():
+       sphere = ds.sphere("max", (1.0, "pc))
+       L_vecs = sphere.quantities.angular_momentum_vector()
 
 If you do not want to use ``parallel_objects`` parallelism when using a
-TimeSeries object, set ``parallel = False``.  When running python in parallel,
+DatasetSeries object, set ``parallel = False``.  When running python in parallel,
 this will use all of the available processors to evaluate the requested
 operation on each simulation output.  Some care and possibly trial and error
-might be necessary to estimate the correct settings for your Simulation
+might be necessary to estimate the correct settings for your simulation
 outputs.
 
 Parallel Performance, Resources, and Tuning
@@ -350,53 +337,50 @@
 provide some insight into what are good starting values for a given parallel
 task.
 
-Grid Decomposition
-++++++++++++++++++
+Chunk Decomposition
++++++++++++++++++++
 
 In general, these types of parallel calculations scale very well with number of
-processors.
-They are also fairly memory-conservative.
-The two limiting factors is therefore the number of grids in the dataset,
-and the speed of the disk the data is stored on.
-There is no point in running a parallel job of this kind with more processors
-than grids, because the extra processors will do absolutely nothing, and will
-in fact probably just serve to slow down the whole calculation due to the extra
-overhead.
-The speed of the disk is also a consideration - if it is not a high-end parallel
-file system, adding more tasks will not speed up the calculation if the disk
-is already swamped with activity.
+processors.  They are also fairly memory-conservative.  The two limiting factors
+is therefore the number of chunks in the dataset, and the speed of the disk the
+data is stored on.  There is no point in running a parallel job of this kind
+with more processors than chunks, because the extra processors will do absolutely
+nothing, and will in fact probably just serve to slow down the whole calculation
+due to the extra overhead.  The speed of the disk is also a consideration - if
+it is not a high-end parallel file system, adding more tasks will not speed up
+the calculation if the disk is already swamped with activity.
 
-The best advice for these sort of calculations is to run 
-with just a few processors and go from there, seeing if it the runtime
-improves noticeably.
+The best advice for these sort of calculations is to run with just a few
+processors and go from there, seeing if it the runtime improves noticeably.
 
 **Projections, Slices, and Cutting Planes**
 
 Projections, slices and cutting planes are the most common methods of creating
 two-dimensional representations of data.  All three have been parallelized in a
-grid-based fashion.
+chunk-based fashion.
 
  * Projections: projections are parallelized utilizing a quad-tree approach.
    Data is loaded for each processor, typically by a process that consolidates
-   open/close/read operations, and each grid is then iterated over and cells
-   are deposited into a data structure that stores values corresponding to
-   positions in the two-dimensional plane.  This provides excellent load
-   balancing, and in serial is quite fast.  However, as of ``yt`` 2.3, the
-   operation by which quadtrees are joined across processors scales poorly;
-   while memory consumption scales well, the time to completion does not.  As
-   such, projections can often be done very fast when operating only on a single
-   processor!  The quadtree algorithm can be used inline (and, indeed, it is
-   for this reason that it is slow.)  It is recommended that you attempt to
-   project in serial before projecting in parallel; even for the very largest
-   datasets (Enzo 1024^3 root grid with 7 levels of refinement) in the absence
-   of IO the quadtree algorithm takes only three minutes or so on a decent
-   processor.
- * Slices: to generate a slice, grids that intersect a given slice are iterated
-   over and their finest-resolution cells are deposited.  The grids are
+   open/close/read operations, and each grid is then iterated over and cells are
+   deposited into a data structure that stores values corresponding to positions
+   in the two-dimensional plane.  This provides excellent load balancing, and in
+   serial is quite fast.  However, the operation by which quadtrees are joined
+   across processors scales poorly; while memory consumption scales well, the
+   time to completion does not.  As such, projections can often be done very
+   fast when operating only on a single processor!  The quadtree algorithm can
+   be used inline (and, indeed, it is for this reason that it is slow.)  It is
+   recommended that you attempt to project in serial before projecting in
+   parallel; even for the very largest datasets (Enzo 1024^3 root grid with 7
+   levels of refinement) in the absence of IO the quadtree algorithm takes only
+   three minutes or so on a decent processor.
+
+ * Slices: to generate a slice, chunks that intersect a given slice are iterated
+   over and their finest-resolution cells are deposited.  The chunks are
    decomposed via standard load balancing.  While this operation is parallel,
    **it is almost never necessary to slice a dataset in parallel**, as all data is
    loaded on demand anyway.  The slice operation has been parallelized so as to
    enable slicing when running *in situ*.
+
  * Cutting planes: cutting planes are parallelized exactly as slices are.
    However, in contrast to slices, because the data-selection operation can be
    much more time consuming, cutting planes often benefit from parallelism.
@@ -404,7 +388,7 @@
 Object-Based
 ++++++++++++
 
-Like grid decomposition, it does not help to run with more processors than the
+Like chunk decomposition, it does not help to run with more processors than the
 number of objects to be iterated over.
 There is also the matter of the kind of work being done on each object, and
 whether it is disk-intensive, cpu-intensive, or memory-intensive.
@@ -436,37 +420,28 @@
 
 **Halo-Finding**
 
-Halo finding, along with the merger tree that uses halo finding, operates
-on the particles in the volume, and is therefore mostly grid-agnostic.
-Generally, the biggest concern for halo finding is the amount of memory needed.
-There is subtle art in estimating the amount of memory needed for halo finding,
-but a rule of thumb is that Parallel HOP (:func:`parallelHF`) is the most
-memory-intensive, followed by plain HOP (:func:`HaloFinder`),
-with Friends of Friends (:func:`FOFHaloFinder`) being
-the most memory-conservative.
-It has been found that :func:`parallelHF` needs roughly
-1 MB of memory per 5,000
-particles, although recent work has improved this and the memory requirement
-is now smaller than this. But this is a good starting point for beginning to
-calculate the memory required for halo-finding.
-
-**Two point functions**
-
-Please see :ref:`tpf_strategies` for more details.
+Halo finding, along with the merger tree that uses halo finding, operates on the
+particles in the volume, and is therefore mostly chunk-agnostic.  Generally, the
+biggest concern for halo finding is the amount of memory needed.  There is
+subtle art in estimating the amount of memory needed for halo finding, but a
+rule of thumb is that the HOP halo finder is the most memory intensive
+(:func:`HaloFinder`), and Friends of Friends (:func:`FOFHaloFinder`) being the
+most memory-conservative.  It has been found that :func:`parallelHF` needs
+roughly 1 MB of memory per 5,000 particles, although recent work has improved
+this and the memory requirement is now smaller than this. But this is a good
+starting point for beginning to calculate the memory required for halo-finding.
 
 **Volume Rendering**
 
-The simplest way to think about volume rendering, and the radial column density
-module that uses it, is that it load-balances over the grids in the dataset.
-Each processor is given roughly the same sized volume to operate on.
-In practice, there are just a few things to keep in mind when doing volume
-rendering.
-First, it only uses a power of two number of processors.
-If the job is run with 100 processors, only 64 of them will actually do anything.
-Second, the absolute maximum number of processors is the number of grids.
-But in order to keep work distributed evenly, typically the number of processors
-should be no greater than one-eighth or one-quarter the number of processors
-that were used to produce the dataset.
+The simplest way to think about volume rendering, is that it load-balances over
+the i/o chunks in the dataset.  Each processor is given roughly the same sized
+volume to operate on.  In practice, there are just a few things to keep in mind
+when doing volume rendering.  First, it only uses a power of two number of
+processors.  If the job is run with 100 processors, only 64 of them will
+actually do anything.  Second, the absolute maximum number of processors is the
+number of chunks.  In order to keep work distributed evenly, typically the
+number of processors should be no greater than one-eighth or one-quarter the
+number of processors that were used to produce the dataset.
 
 Additional Tips
 ---------------
@@ -500,21 +475,21 @@
     
     .. code-block:: python
     
-       from yt.mods import *
+       import yt
        import time
        
-       pf = load("DD0152")
+       ds = yt.load("DD0152")
        t0 = time.time()
        bigstuff, hugestuff = StuffFinder(pf)
-       BigHugeStuffParallelFunction(pf, bigstuff, hugestuff)
+       BigHugeStuffParallelFunction(ds, bigstuff, hugestuff)
        t1 = time.time()
        for i in range(1000000):
            tinystuff, ministuff = GetTinyMiniStuffOffDisk("in%06d.txt" % i)
-           array = TinyTeensyParallelFunction(pf, tinystuff, ministuff)
+           array = TinyTeensyParallelFunction(ds, tinystuff, ministuff)
            SaveTinyMiniStuffToDisk("out%06d.txt" % i, array)
        t2 = time.time()
        
-       if is_root()
+       if yt.is_root()
            print "BigStuff took %.5e sec, TinyStuff took %.5e sec" % (t1 - t0, t2 - t1)
   
   * Remember that if the script handles disk IO explicitly, and does not use
@@ -526,7 +501,7 @@
     
     .. code-block:: python
        
-       if is_root()
+       if yt.is_root()
            file = open("out.txt", "w")
            file.write(stuff)
            file.close()

diff -r 894d44f837d4d259550bb7b741091424226375b7 -r e320b4a52a2628c9d437f3221cfcb32bc1abf507 doc/source/reference/api/api.rst
--- a/doc/source/reference/api/api.rst
+++ b/doc/source/reference/api/api.rst
@@ -742,6 +742,7 @@
    ~yt.funcs.time_execution
    ~yt.analysis_modules.level_sets.contour_finder.identify_contours
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_blocking_call
+   ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_objects
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_passthrough
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_root_only
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_simple_proxy


https://bitbucket.org/yt_analysis/yt/commits/7347f08d92d7/
Changeset:   7347f08d92d7
Branch:      yt-3.0
User:        ngoldbaum
Date:        2014-07-18 23:33:49
Summary:     Merging to clear conflicts.
Affected #:  3 files

diff -r b04c9b3692db47c99a367f0decda46fcbbca3801 -r 7347f08d92d79194a7ca3b841498a40affcc8a11 doc/source/analyzing/ionization_cube.py
--- a/doc/source/analyzing/ionization_cube.py
+++ b/doc/source/analyzing/ionization_cube.py
@@ -1,14 +1,17 @@
-from yt.mods import *
+import yt
 from yt.utilities.parallel_tools.parallel_analysis_interface \
     import communication_system
-import h5py, glob, time
 
- at derived_field(name = "IonizedHydrogen",
-               units = r"\frac{\rho_{HII}}{rho_H}")
+import h5py
+import time
+import numpy as np
+
+ at yt.derived_field(name="IonizedHydrogen", units="",
+                  display_name=r"\frac{\rho_{HII}}{\rho_H}")
 def IonizedHydrogen(field, data):
     return data["HII_Density"]/(data["HI_Density"]+data["HII_Density"])
 
-ts = DatasetSeries.from_filenames("SED800/DD*/*.index", parallel = 8)
+ts = yt.DatasetSeries("SED800/DD*/*.index", parallel=8)
 
 ionized_z = np.zeros(ts[0].domain_dimensions, dtype="float32")
 

diff -r b04c9b3692db47c99a367f0decda46fcbbca3801 -r 7347f08d92d79194a7ca3b841498a40affcc8a11 doc/source/analyzing/parallel_computation.rst
--- a/doc/source/analyzing/parallel_computation.rst
+++ b/doc/source/analyzing/parallel_computation.rst
@@ -22,14 +22,11 @@
    :ref:`derived-quantities`)
  * 1-, 2-, and 3-D profiles (:ref:`generating-profiles-and-histograms`)
  * Halo finding (:ref:`halo_finding`)
- * Merger tree (:ref:`merger_tree`)
- * Two point functions (:ref:`two_point_functions`)
  * Volume rendering (:ref:`volume_rendering`)
- * Radial column density (:ref:`radial-column-density`)
  * Isocontours & flux calculations (:ref:`extracting-isocontour-information`)
 
 This list covers just about every action ``yt`` can take!  Additionally, almost all
-scripts will benefit from parallelization without any modification.  The goal
+scripts will benefit from parallelization with minimal modification.  The goal
 of Parallel-``yt`` has been to retain API compatibility and abstract all
 parallelism.
 
@@ -45,14 +42,15 @@
 
     $ pip install mpi4py
 
-Once that has been installed, you're all done!  You just need to launch your 
-scripts with ``mpirun`` (or equivalent) and signal to ``yt`` that you want to run 
-them in parallel.  In general, that's all it takes to get a speed benefit on a 
+Once that has been installed, you're all done!  You just need to launch your
+scripts with ``mpirun`` (or equivalent) and signal to ``yt`` that you want to
+run them in parallel by invoking the ``yt.enable_parallelism()`` function in
+your script.  In general, that's all it takes to get a speed benefit on a
 multi-core machine.  Here is an example on an 8-core desktop:
 
 .. code-block:: bash
 
-    $ mpirun -np 8 python script.py --parallel
+    $ mpirun -np 8 python script.py
 
 Throughout its normal operation, ``yt`` keeps you aware of what is happening with
 regular messages to the stderr usually prefaced with: 
@@ -71,10 +69,9 @@
 
 in the case of two cores being used.
 
-It's important to note that all of the processes listed in `capabilities` work
--- and no additional work is necessary to parallelize those processes.
-Furthermore, the ``yt`` command itself recognizes the ``--parallel`` option, so
-those commands will work in parallel as well.
+It's important to note that all of the processes listed in :ref:`capabilities`
+work in parallel -- and no additional work is necessary to parallelize those
+processes.
 
 Running a ``yt`` script in parallel
 -----------------------------------
@@ -85,11 +82,12 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ds = load("RD0035/RedshiftOutput0035")
+   import yt
+   yt.enable_parallelism()
+   ds = yt.load("RD0035/RedshiftOutput0035")
    v, c = ds.find_max("density")
    print v, c
-   p = ProjectionPlot(ds, "x", "density")
+   p = yt.ProjectionPlot(ds, "x", "density")
    p.save()
 
 If this script is run in parallel, two of the most expensive operations -
@@ -99,7 +97,7 @@
 
 .. code-block:: bash
 
-   $ mpirun -np 16 python2.7 my_script.py --parallel
+   $ mpirun -np 16 python2.7 my_script.py
 
 .. note::
 
@@ -126,11 +124,11 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ds = load("RD0035/RedshiftOutput0035")
+   import yt
+   ds = yt.load("RD0035/RedshiftOutput0035")
    v, c = ds.find_max("density")
-   p = ProjectionPlot(ds, "x", "density")
-   if is_root():
+   p = yt.ProjectionPlot(ds, "x", "density")
+   if yt.is_root():
        print v, c
        p.save()
 
@@ -144,17 +142,17 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
+   import yt
 
    def print_and_save_plot(v, c, plot, print=True):
        if print:
           print v, c
        plot.save()
 
-   ds = load("RD0035/RedshiftOutput0035")
+   ds = yt.load("RD0035/RedshiftOutput0035")
    v, c = ds.find_max("density")
-   p = ProjectionPlot(ds, "x", "density")
-   only_on_root(print_and_save_plot, v, c, plot, print=True)
+   p = yt.ProjectionPlot(ds, "x", "density")
+   yt.only_on_root(print_and_save_plot, v, c, plot, print=True)
 
 Types of Parallelism
 --------------------
@@ -174,19 +172,17 @@
 The following operations use spatial decomposition:
 
   * Halo finding
-  * Merger tree
-  * Two point functions
   * Volume rendering
-  * Radial column density
 
 Grid Decomposition
 ++++++++++++++++++
 
-The alternative to spatial decomposition is a simple round-robin of the grids.
-This process allows ``yt`` to pool data access to a given Enzo data file, which
-ultimately results in faster read times and better parallelism.
+The alternative to spatial decomposition is a simple round-robin of data chunks,
+which could be grids, octs, or whatever chunking mechanism is used by the code
+frontend begin used.  This process allows ``yt`` to pool data access to a given
+data file, which ultimately results in faster read times and better parallelism.
 
-The following operations use grid decomposition:
+The following operations use chunk decomposition:
 
   * Projections
   * Slices
@@ -211,16 +207,15 @@
 ---------------------------
 
 It is easy within ``yt`` to parallelize a list of tasks, as long as those tasks
-are independent of one another.
-Using object-based parallelism, the function :func:`parallel_objects` will
-automatically split up a list of tasks over the specified number of processors
-(or cores).
-Please see this heavily-commented example:
+are independent of one another. Using object-based parallelism, the function
+:func:`~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_objects`
+will automatically split up a list of tasks over the specified number of
+processors (or cores).  Please see this heavily-commented example:
 
 .. code-block:: python
    
    # As always...
-   from yt.mods import *
+   import yt
    
    import glob
    
@@ -249,19 +244,19 @@
    # If data does not need to be combined after the loop is done, the line
    # would look like:
    #       for fn in parallel_objects(fns, num_procs):
-   for sto, fn in parallel_objects(fns, num_procs, storage = my_storage):
+   for sto, fn in yt.parallel_objects(fns, num_procs, storage = my_storage):
 
        # Open a data file, remembering that fn is different on each task.
-       ds = load(fn)
+       ds = yt.load(fn)
        dd = ds.all_data()
 
        # This copies fn and the min/max of density to the local copy of
        # my_storage
        sto.result_id = fn
-       sto.result = dd.quantities["Extrema"]("density")
+       sto.result = dd.quantities.extrema("density")
 
        # Makes and saves a plot of the gas density.
-       p = ProjectionPlot(ds, "x", "density")
+       p = yt.ProjectionPlot(ds, "x", "density")
        p.save()
 
    # At this point, as the loop exits, the local copies of my_storage are
@@ -270,7 +265,7 @@
    # tasks have produced.
    # Below, the values in my_storage are printed by only one task. The other
    # tasks do nothing.
-   if is_root()
+   if yt.is_root()
        for fn, vals in sorted(my_storage.items()):
            print fn, vals
 
@@ -282,43 +277,33 @@
 Parallel Time Series Analysis
 -----------------------------
 
-The same :func:`parallel_objects` machinery discussed above is turned on by
-default when using a ``DatasetSeries`` object (see :ref:`time-series-analysis`)
-to iterate over simulation outputs.  The syntax for this is very simple.  As an
-example, we can use the following script to find the angular momentum vector in
-a 1 pc sphere centered on the maximum density cell in a large number of
-simulation outputs:
+The same ``parallel_objects`` machinery discussed above is turned on by
+default when using a :class:`~yt.data_objects.time_series.DatasetSeries` object
+(see :ref:`time-series-analysis`) to iterate over simulation outputs.  The
+syntax for this is very simple.  As an example, we can use the following script
+to find the angular momentum vector in a 1 pc sphere centered on the maximum
+density cell in a large number of simulation outputs:
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ts = DatasetSeries.from_filenames("DD*/output_*", parallel = True)
-   sphere = ts.sphere("max", (1.0, "pc"))
-   L_vecs = sphere.quantities["AngularMomentumVector"]()
+   import yt
+   yt.enable_parallelism()
+
+   ts = yt.load("DD*/output_*")
+
+   storage = {}
+
+   for sto, ds in ts.piter():
+       sphere = ds.sphere("max", (1.0, "pc"))
+       sto.result = sphere.quantities.angular_momentum_vector()
+       sto.result_id = str(ds)
+
+   for L in sorted(storage.items()):
+       print L
 
 Note that this script can be run in serial or parallel with an arbitrary number
 of processors.  When running in parallel, each output is given to a different
-processor.  By default, parallel is set to ``True``, so you do not have to
-explicitly set ``parallel = True`` as in the above example. 
-
-One could get the same effect by iterating over the individual datasets
-in the DatasetSeries object:
-
-.. code-block:: python
-
-   from yt.pmods import *
-   ts = DatasetSeries.from_filenames("DD*/output_*", parallel = True)
-   my_storage = {}
-   for sto,ds in ts.piter(storage=my_storage):
-       sphere = ds.sphere("max", (1.0, "pc"))
-       L_vec = sphere.quantities["AngularMomentumVector"]()
-       sto.result_id = ds.parameter_filename
-       sto.result = L_vec
-
-   L_vecs = []
-   for fn, L_vec in sorted(my_storage.items()):
-       L_vecs.append(L_vec)
-
+processor.
 
 You can also request a fixed number of processors to calculate each
 angular momentum vector.  For example, this script will calculate each angular
@@ -328,16 +313,18 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ts = DatasetSeries.from_filenames("DD*/output_*", parallel = 4)
-   sphere = ts.sphere("max", (1.0, "pc))
-   L_vecs = sphere.quantities["AngularMomentumVector"]()
+   import yt
+   ts = yt.DatasetSeries("DD*/output_*", parallel = 4)
+   
+   for ds in ts.piter():
+       sphere = ds.sphere("max", (1.0, "pc))
+       L_vecs = sphere.quantities.angular_momentum_vector()
 
 If you do not want to use ``parallel_objects`` parallelism when using a
-TimeSeries object, set ``parallel = False``.  When running python in parallel,
+DatasetSeries object, set ``parallel = False``.  When running python in parallel,
 this will use all of the available processors to evaluate the requested
 operation on each simulation output.  Some care and possibly trial and error
-might be necessary to estimate the correct settings for your Simulation
+might be necessary to estimate the correct settings for your simulation
 outputs.
 
 Parallel Performance, Resources, and Tuning
@@ -350,53 +337,50 @@
 provide some insight into what are good starting values for a given parallel
 task.
 
-Grid Decomposition
-++++++++++++++++++
+Chunk Decomposition
++++++++++++++++++++
 
 In general, these types of parallel calculations scale very well with number of
-processors.
-They are also fairly memory-conservative.
-The two limiting factors is therefore the number of grids in the dataset,
-and the speed of the disk the data is stored on.
-There is no point in running a parallel job of this kind with more processors
-than grids, because the extra processors will do absolutely nothing, and will
-in fact probably just serve to slow down the whole calculation due to the extra
-overhead.
-The speed of the disk is also a consideration - if it is not a high-end parallel
-file system, adding more tasks will not speed up the calculation if the disk
-is already swamped with activity.
+processors.  They are also fairly memory-conservative.  The two limiting factors
+is therefore the number of chunks in the dataset, and the speed of the disk the
+data is stored on.  There is no point in running a parallel job of this kind
+with more processors than chunks, because the extra processors will do absolutely
+nothing, and will in fact probably just serve to slow down the whole calculation
+due to the extra overhead.  The speed of the disk is also a consideration - if
+it is not a high-end parallel file system, adding more tasks will not speed up
+the calculation if the disk is already swamped with activity.
 
-The best advice for these sort of calculations is to run 
-with just a few processors and go from there, seeing if it the runtime
-improves noticeably.
+The best advice for these sort of calculations is to run with just a few
+processors and go from there, seeing if it the runtime improves noticeably.
 
 **Projections, Slices, and Cutting Planes**
 
 Projections, slices and cutting planes are the most common methods of creating
 two-dimensional representations of data.  All three have been parallelized in a
-grid-based fashion.
+chunk-based fashion.
 
  * Projections: projections are parallelized utilizing a quad-tree approach.
    Data is loaded for each processor, typically by a process that consolidates
-   open/close/read operations, and each grid is then iterated over and cells
-   are deposited into a data structure that stores values corresponding to
-   positions in the two-dimensional plane.  This provides excellent load
-   balancing, and in serial is quite fast.  However, as of ``yt`` 2.3, the
-   operation by which quadtrees are joined across processors scales poorly;
-   while memory consumption scales well, the time to completion does not.  As
-   such, projections can often be done very fast when operating only on a single
-   processor!  The quadtree algorithm can be used inline (and, indeed, it is
-   for this reason that it is slow.)  It is recommended that you attempt to
-   project in serial before projecting in parallel; even for the very largest
-   datasets (Enzo 1024^3 root grid with 7 levels of refinement) in the absence
-   of IO the quadtree algorithm takes only three minutes or so on a decent
-   processor.
- * Slices: to generate a slice, grids that intersect a given slice are iterated
-   over and their finest-resolution cells are deposited.  The grids are
+   open/close/read operations, and each grid is then iterated over and cells are
+   deposited into a data structure that stores values corresponding to positions
+   in the two-dimensional plane.  This provides excellent load balancing, and in
+   serial is quite fast.  However, the operation by which quadtrees are joined
+   across processors scales poorly; while memory consumption scales well, the
+   time to completion does not.  As such, projections can often be done very
+   fast when operating only on a single processor!  The quadtree algorithm can
+   be used inline (and, indeed, it is for this reason that it is slow.)  It is
+   recommended that you attempt to project in serial before projecting in
+   parallel; even for the very largest datasets (Enzo 1024^3 root grid with 7
+   levels of refinement) in the absence of IO the quadtree algorithm takes only
+   three minutes or so on a decent processor.
+
+ * Slices: to generate a slice, chunks that intersect a given slice are iterated
+   over and their finest-resolution cells are deposited.  The chunks are
    decomposed via standard load balancing.  While this operation is parallel,
    **it is almost never necessary to slice a dataset in parallel**, as all data is
    loaded on demand anyway.  The slice operation has been parallelized so as to
    enable slicing when running *in situ*.
+
  * Cutting planes: cutting planes are parallelized exactly as slices are.
    However, in contrast to slices, because the data-selection operation can be
    much more time consuming, cutting planes often benefit from parallelism.
@@ -404,7 +388,7 @@
 Object-Based
 ++++++++++++
 
-Like grid decomposition, it does not help to run with more processors than the
+Like chunk decomposition, it does not help to run with more processors than the
 number of objects to be iterated over.
 There is also the matter of the kind of work being done on each object, and
 whether it is disk-intensive, cpu-intensive, or memory-intensive.
@@ -436,37 +420,28 @@
 
 **Halo-Finding**
 
-Halo finding, along with the merger tree that uses halo finding, operates
-on the particles in the volume, and is therefore mostly grid-agnostic.
-Generally, the biggest concern for halo finding is the amount of memory needed.
-There is subtle art in estimating the amount of memory needed for halo finding,
-but a rule of thumb is that Parallel HOP (:func:`parallelHF`) is the most
-memory-intensive, followed by plain HOP (:func:`HaloFinder`),
-with Friends of Friends (:func:`FOFHaloFinder`) being
-the most memory-conservative.
-It has been found that :func:`parallelHF` needs roughly
-1 MB of memory per 5,000
-particles, although recent work has improved this and the memory requirement
-is now smaller than this. But this is a good starting point for beginning to
-calculate the memory required for halo-finding.
-
-**Two point functions**
-
-Please see :ref:`tpf_strategies` for more details.
+Halo finding, along with the merger tree that uses halo finding, operates on the
+particles in the volume, and is therefore mostly chunk-agnostic.  Generally, the
+biggest concern for halo finding is the amount of memory needed.  There is
+subtle art in estimating the amount of memory needed for halo finding, but a
+rule of thumb is that the HOP halo finder is the most memory intensive
+(:func:`HaloFinder`), and Friends of Friends (:func:`FOFHaloFinder`) being the
+most memory-conservative.  It has been found that :func:`parallelHF` needs
+roughly 1 MB of memory per 5,000 particles, although recent work has improved
+this and the memory requirement is now smaller than this. But this is a good
+starting point for beginning to calculate the memory required for halo-finding.
 
 **Volume Rendering**
 
-The simplest way to think about volume rendering, and the radial column density
-module that uses it, is that it load-balances over the grids in the dataset.
-Each processor is given roughly the same sized volume to operate on.
-In practice, there are just a few things to keep in mind when doing volume
-rendering.
-First, it only uses a power of two number of processors.
-If the job is run with 100 processors, only 64 of them will actually do anything.
-Second, the absolute maximum number of processors is the number of grids.
-But in order to keep work distributed evenly, typically the number of processors
-should be no greater than one-eighth or one-quarter the number of processors
-that were used to produce the dataset.
+The simplest way to think about volume rendering, is that it load-balances over
+the i/o chunks in the dataset.  Each processor is given roughly the same sized
+volume to operate on.  In practice, there are just a few things to keep in mind
+when doing volume rendering.  First, it only uses a power of two number of
+processors.  If the job is run with 100 processors, only 64 of them will
+actually do anything.  Second, the absolute maximum number of processors is the
+number of chunks.  In order to keep work distributed evenly, typically the
+number of processors should be no greater than one-eighth or one-quarter the
+number of processors that were used to produce the dataset.
 
 Additional Tips
 ---------------
@@ -500,12 +475,12 @@
     
     .. code-block:: python
     
-       from yt.mods import *
+       import yt
        import time
-       
-       ds = load("DD0152")
+
+       ds = yt.load("DD0152")
        t0 = time.time()
-       bigstuff, hugestuff = StuffFinder(ds)
+       bigstuff, hugestuff = StuffFinder(pf)
        BigHugeStuffParallelFunction(ds, bigstuff, hugestuff)
        t1 = time.time()
        for i in range(1000000):
@@ -514,7 +489,7 @@
            SaveTinyMiniStuffToDisk("out%06d.txt" % i, array)
        t2 = time.time()
        
-       if is_root()
+       if yt.is_root()
            print "BigStuff took %.5e sec, TinyStuff took %.5e sec" % (t1 - t0, t2 - t1)
   
   * Remember that if the script handles disk IO explicitly, and does not use
@@ -526,7 +501,7 @@
     
     .. code-block:: python
        
-       if is_root()
+       if yt.is_root()
            file = open("out.txt", "w")
            file.write(stuff)
            file.close()

diff -r b04c9b3692db47c99a367f0decda46fcbbca3801 -r 7347f08d92d79194a7ca3b841498a40affcc8a11 doc/source/reference/api/api.rst
--- a/doc/source/reference/api/api.rst
+++ b/doc/source/reference/api/api.rst
@@ -742,6 +742,7 @@
    ~yt.funcs.time_execution
    ~yt.analysis_modules.level_sets.contour_finder.identify_contours
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_blocking_call
+   ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_objects
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_passthrough
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_root_only
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_simple_proxy


https://bitbucket.org/yt_analysis/yt/commits/2293f9d3e294/
Changeset:   2293f9d3e294
Branch:      yt-3.0
User:        ngoldbaum
Date:        2014-07-19 18:47:39
Summary:     Responding to PR comments.  Adding some text to yt3differences about changes
for parallelism.
Affected #:  2 files

diff -r 7347f08d92d79194a7ca3b841498a40affcc8a11 -r 2293f9d3e294ebdc65f6700a06103f847021100c doc/source/analyzing/parallel_computation.rst
--- a/doc/source/analyzing/parallel_computation.rst
+++ b/doc/source/analyzing/parallel_computation.rst
@@ -293,7 +293,7 @@
 
    storage = {}
 
-   for sto, ds in ts.piter():
+   for sto, ds in ts.piter(storage=storage):
        sphere = ds.sphere("max", (1.0, "pc"))
        sto.result = sphere.quantities.angular_momentum_vector()
        sto.result_id = str(ds)
@@ -480,7 +480,7 @@
 
        ds = yt.load("DD0152")
        t0 = time.time()
-       bigstuff, hugestuff = StuffFinder(pf)
+       bigstuff, hugestuff = StuffFinder(ds)
        BigHugeStuffParallelFunction(ds, bigstuff, hugestuff)
        t1 = time.time()
        for i in range(1000000):

diff -r 7347f08d92d79194a7ca3b841498a40affcc8a11 -r 2293f9d3e294ebdc65f6700a06103f847021100c doc/source/yt3differences.rst
--- a/doc/source/yt3differences.rst
+++ b/doc/source/yt3differences.rst
@@ -21,6 +21,9 @@
 
 Here's a quick reference for how to update your code to work with yt-3.0.
 
+  * Importing yt is now as simple as ``import yt``.  The docs have been
+    extensively updated to reflect this new style.  ``from yt.mods import *``
+    still works, but we are discouraging its use going forward.
   * Fields can be accessed by a name, but are named internally as ``(fluid_type,
     fluid_name)``.
   * Fields on-disk will be in code units, and will be named ``(code_name,
@@ -36,6 +39,11 @@
     return a single tuple if you only ask for one field.
   * Units can be tricky, and they try to keep you from making weird things like
     ``ergs`` + ``g``.  See :ref:`units` for more information.
+  * Previously, yt would capture command line arguments when being imported.
+    This no longer happens.  As a side effect, it is no longer necessary to
+    specify ``--parallel`` at the command line when running a parallel 
+    computation. Use ``yt.enable_parallelism()`` instead.  See 
+    :ref:`parallel-computation` for more detail.
 
 Cool New Things
 ---------------


https://bitbucket.org/yt_analysis/yt/commits/685c5a89e6e4/
Changeset:   685c5a89e6e4
Branch:      yt-3.0
User:        ngoldbaum
Date:        2014-07-19 18:48:22
Summary:     Merging with mainline tip.
Affected #:  13 files

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af doc/source/analyzing/fields.rst
--- /dev/null
+++ b/doc/source/analyzing/fields.rst
@@ -0,0 +1,19 @@
+Particle Fields
+====================================
+Naturally, particle fields contain properties of particles rather than
+grid cells.  Many of these fields have corresponding grid fields that
+can be populated by "depositing" the particle values onto a yt grid.
+
+General Particle Fields
+------------------------------------
+Every particle will contain both a ``particle_position`` and ``particle_velocity``
+that tracks the position and velocity (respectively) in code units.
+
+
+SPH Fields
+------------------------------------
+For gas particles from SPH simulations, each particle will typically carry
+a field for the smoothing length `h`, which is roughly equivalent to 
+`(m/\rho)^{1/3}`, where `m` and `rho` are the particle mass and density 
+respectively.  This can be useful for doing neighbour finding.
+

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af doc/source/analyzing/objects.rst
--- a/doc/source/analyzing/objects.rst
+++ b/doc/source/analyzing/objects.rst
@@ -152,6 +152,11 @@
 Combining Objects: Boolean Data Objects
 ---------------------------------------
 
+.. note:: Boolean Data Objects have not yet been ported to yt 3.0 from
+    yt 2.x.  If you are interested in aiding in this port, please contact
+    the yt-dev mailing list.  Until it is ported, this functionality below
+    will not work.
+
 A special type of data object is the *boolean* data object.
 It works only on three-dimensional objects.
 It is built by relating already existing data objects with boolean operators.

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af doc/source/examining/loading_data.rst
--- a/doc/source/examining/loading_data.rst
+++ b/doc/source/examining/loading_data.rst
@@ -156,7 +156,7 @@
 yt has support for reading Gadget data in both raw binary and HDF5 formats.  It
 is able to access the particles as it would any other particle dataset, and it
 can apply smoothing kernels to the data to produce both quantitative analysis
-and visualization.
+and visualization.  See also the section :ref:`loading-sph-data` 
 
 Gadget data in HDF5 format can be loaded with the ``load`` command:
 
@@ -367,7 +367,8 @@
 yt also supports loading Tipsy data.  Many of its characteristics are similar
 to how Gadget data is loaded; specifically, it shares its definition of
 indexing and mesh-identification with that described in
-:ref:`particle-indexing-criteria`.  
+:ref:`particle-indexing-criteria`.  Like with gadget, see 
+:ref:`loading-sph-data for more details`.  
 
 .. code-block:: python
 
@@ -903,3 +904,20 @@
 ---------------------
 
 .. notebook:: Loading_Generic_Particle_Data.ipynb
+
+.. _loading_sph_data:
+
+SPH Particle Data
+-----------------
+For all of the SPH frontends, yt uses a cython-based SPH to created deposit
+mesh fields from individual particle fields.  This uses a standard M4 smoothing
+kernel and the ``SmoothingLength`` field to calculate SPH sums, filling in the
+mesh fields.  This gives you the ability to both track individual particles
+(useful for tasks like following contiguous clouds of gas that would be require
+a clump finder in grid data) as well as doing standard grid-based analysis.
+The ``SmoothingLength`` variable is also useful for determining which particles
+can interact with each other, since particles more distant than twice the
+smoothing length do not typically see each other in SPH simulations.  By
+changing the value of the ``SmoothingLength`` and then re-depositing particles
+onto the grid, you can also effectively mimic what your data would look like at
+lower resolution.

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af doc/source/reference/api/api.rst
--- a/doc/source/reference/api/api.rst
+++ b/doc/source/reference/api/api.rst
@@ -456,18 +456,6 @@
    ~yt.analysis_modules.halo_merger_tree.enzofof_merger_tree.EnzoFOFMergerTree
    ~yt.analysis_modules.halo_merger_tree.enzofof_merger_tree.plot_halo_evolution
 
-Halo Profiling
-^^^^^^^^^^^^^^
-
-yt provides a comprehensive halo profiler that can filter, center, and analyze
-halos en masse.
-
-.. autosummary::
-   :toctree: generated/
-
-   ~yt.analysis_modules.halo_profiler.multi_halo_profiler.HaloProfiler
-   ~yt.analysis_modules.halo_profiler.multi_halo_profiler.VirialFilter
-
 
 Two Point Functions
 ^^^^^^^^^^^^^^^^^^^
@@ -512,16 +500,6 @@
 Extension Types
 ---------------
 
-Coordinate Transformations
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-
-.. autosummary::
-   :toctree: generated/
-
-   ~yt.analysis_modules.coordinate_transformation.transforms.arbitrary_regrid
-   ~yt.analysis_modules.coordinate_transformation.transforms.spherical_regrid
-
 Cosmology, Star Particle Analysis, and Simulated Observations
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -576,17 +554,6 @@
    ~yt.analysis_modules.radmc3d_export.RadMC3DInterface.RadMC3DLayer
    ~yt.analysis_modules.radmc3d_export.RadMC3DInterface.RadMC3DWriter
 
-Radial Column Density
-^^^^^^^^^^^^^^^^^^^^^
-
-If you'd like to calculate the column density out to a given point, from a
-specified center, yt can provide that information.
-
-.. autosummary::
-   :toctree: generated/
-
-   ~yt.analysis_modules.radial_column_density.radial_column_density.RadialColumnDensity
-
 Volume Rendering
 ^^^^^^^^^^^^^^^^
 

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af doc/source/visualizing/_cb_docstrings.inc
--- a/doc/source/visualizing/_cb_docstrings.inc
+++ b/doc/source/visualizing/_cb_docstrings.inc
@@ -388,23 +388,6 @@
    ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
    p = yt.ProjectionPlot(ds, 'z', 'density', center='c', width=(20, 'kpc'))
    p.annotate_title('Density Plot')
-   s.save()
-
--------------
-
-.. function:: annotate_title(self, title='Plot'):
-
-   (This is a proxy for :class:`~yt.visualization.plot_modifications.TitleCallback`.)
-
-   Accepts a *title* and adds it to the plot
-
-.. python-script::
-
-   from yt.mods import *
-   ds = load("IsolatedGalaxy/galaxy0030/galaxy0030")
-   p = ProjectionPlot(ds, 'z', 'density', center='c', width=(20, 'kpc'))
-   p.annotate_title('Density plot')
->>>>>>> other
    p.save()
 
 Overplot quivers for the velocity field

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af yt/analysis_modules/cosmological_observation/light_cone/light_cone.py
--- a/yt/analysis_modules/cosmological_observation/light_cone/light_cone.py
+++ b/yt/analysis_modules/cosmological_observation/light_cone/light_cone.py
@@ -389,169 +389,6 @@
                 attrs={"field_of_view": str(field_of_view),
                        "image_resolution": str(image_resolution)})
 
-    def rerandomize_light_cone_solution(self, new_seed, recycle=True, filename=None):
-        """
-        When making a projection for a light cone, only randomizations along the
-        line of sight make any given projection unique, since the lateral shifting
-        and tiling is done after the projection is made.  Therefore, multiple light
-        cones can be made from a single set of projections by introducing different
-        lateral random shifts and keeping all the original shifts along the line of
-        sight.  This routine will take in a new random seed and rerandomize the
-        parts of the light cone that do not contribute to creating a unique
-        projection object.  Additionally, this routine is built such that if the
-        same random seed is given for the rerandomizing, the solution will be
-        identical to the original.
-
-        This routine has now been updated to be a general solution rescrambler.
-        If the keyword recycle is set to True, then it will recycle.  Otherwise, it
-        will create a completely new solution.
-
-        new_sed : float
-            The new random seed.
-        recycle : bool
-            If True, the new solution will have the same shift in the line of
-            sight as the original solution.  Since the projections of each
-            slice are serialized and stored for the entire width of the box
-            (even if the width used is left than the total box), the projection
-            data can be deserialized instead of being remade from scratch.
-            This can greatly speed up the creation of a large number of light
-            cone projections.
-            Default: True.
-        filename : string
-            If given, a text file detailing the solution will be written out.
-            Default: None.
-
-        """
-
-        # Get rid of old halo mask, if one was there.
-        self.halo_mask = []
-
-        # Clean ds objects out of light cone solution.
-        for my_slice in self.light_cone_solution:
-            if my_slice.has_key('object'):
-                del my_slice['object']
-
-        if recycle:
-            mylog.debug("Recycling solution made with %s with new seed %s." %
-                        (self.original_random_seed, new_seed))
-            self.recycle_random_seed = int(new_seed)
-        else:
-            mylog.debug("Creating new solution with random seed %s." % new_seed)
-            self.original_random_seed = int(new_seed)
-            self.recycle_random_seed = 0
-
-        self.recycle_solution = recycle
-
-        # Keep track of fraction of volume in common between the original and
-        # recycled solution.
-        my_volume = 0.0
-        total_volume = 0.0
-
-        # For box coherence, keep track of effective depth travelled.
-        box_fraction_used = 0.0
-
-        # Seed random number generator with new seed.
-        np.random.seed(int(new_seed))
-
-        for q, output in enumerate(self.light_cone_solution):
-            # It is necessary to make the same number of calls to the random
-            # number generator so the original solution willbe produced if the
-            # same seed is given.
-
-            # Get projection axis and center.
-            # If using box coherence, only get random axis and center if enough
-            # of the box has been used, or if box_fraction_used will be greater
-            # than 1 after this slice.
-            if (q == 0) or (self.minimum_coherent_box_fraction == 0) or \
-                    (box_fraction_used > self.minimum_coherent_box_fraction) or \
-                    (box_fraction_used + self.light_cone_solution[q]['box_depth_fraction'] > 1.0):
-                # Get random projection axis and center.
-                # If recycling, axis will get thrown away since it is used in
-                # creating a unique projection object.
-                newAxis = np.random.randint(0, 3)
-
-                newCenter = [np.random.random() for i in range(3)]
-                box_fraction_used = 0.0
-            else:
-                # Same axis and center as previous slice, but with depth center shifted.
-                newAxis = self.light_cone_solution[q-1]['projection_axis']
-                newCenter = copy.deepcopy(self.light_cone_solution[q-1]['projection_center'])
-                newCenter[newAxis] += \
-                    0.5 * (self.light_cone_solution[q]['box_depth_fraction'] +
-                           self.light_cone_solution[q-1]['box_depth_fraction'])
-                if newCenter[newAxis] >= 1.0:
-                    newCenter[newAxis] -= 1.0
-
-            if recycle:
-                output['projection_axis'] = self.master_solution[q]['projection_axis']
-            else:
-                output['projection_axis'] = newAxis
-
-            box_fraction_used += self.light_cone_solution[q]['box_depth_fraction']
-
-            # Make list of rectangle corners to calculate common volume.
-            newCube = np.zeros(shape=(len(newCenter), 2))
-            oldCube = np.zeros(shape=(len(newCenter), 2))
-            for w in range(len(newCenter)):
-                if (w == self.master_solution[q]['projection_axis']):
-                    oldCube[w] = [self.master_solution[q]['projection_center'][w] -
-                                  0.5 * self.master_solution[q]['box_depth_fraction'],
-                                  self.master_solution[q]['projection_center'][w] +
-                                  0.5 * self.master_solution[q]['box_depth_fraction']]
-                else:
-                    oldCube[w] = [self.master_solution[q]['projection_center'][w] -
-                                  0.5 * self.master_solution[q]['box_width_fraction'],
-                                  self.master_solution[q]['projection_center'][w] +
-                                  0.5 * self.master_solution[q]['box_width_fraction']]
-
-                if (w == output['projection_axis']):
-                    if recycle:
-                        newCube[w] = oldCube[w]
-                    else:
-                        newCube[w] = \
-                          [newCenter[w] -
-                           0.5 * self.master_solution[q]['box_depth_fraction'],
-                           newCenter[w] +
-                           0.5 * self.master_solution[q]['box_depth_fraction']]
-                else:
-                    newCube[w] = [newCenter[w] -
-                                  0.5 * self.master_solution[q]['box_width_fraction'],
-                                  newCenter[w] +
-                                  0.5 * self.master_solution[q]['box_width_fraction']]
-
-            my_volume += common_volume(oldCube, newCube,
-                                           periodic=np.array([[0, 1],
-                                                              [0, 1],
-                                                              [0, 1]]))
-            total_volume += output['box_depth_fraction'] * \
-              output['box_width_fraction']**2
-
-            # Replace centers for every axis except the line of sight axis.
-            for w in range(len(newCenter)):
-                if not(recycle and
-                       (w == self.light_cone_solution[q]['projection_axis'])):
-                    self.light_cone_solution[q]['projection_center'][w] = \
-                      newCenter[w]
-
-        if recycle:
-            mylog.debug("Fractional common volume between master and recycled solution is %.2e" % \
-                        (my_volume / total_volume))
-        else:
-            mylog.debug("Fraction of total volume in common with old solution is %.2e." % \
-                        (my_volume / total_volume))
-            self.master_solution = [copy.deepcopy(q) \
-                                    for q in self.light_cone_solution]
-
-        # Write solution to a file.
-        if filename is not None:
-            self._save_light_cone_solution(filename=filename)
-
-    def restore_master_solution(self):
-        "Reset the active light cone solution to the master solution."
-        self.light_cone_solution = [copy.deepcopy(q) \
-                                    for q in self.master_solution]
->>>>>>> other
-
     @parallel_root_only
     def _save_light_cone_solution(self, filename="light_cone.dat"):
         "Write out a text file with information on light cone solution."

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af yt/analysis_modules/halo_finding/halo_objects.py
--- a/yt/analysis_modules/halo_finding/halo_objects.py
+++ b/yt/analysis_modules/halo_finding/halo_objects.py
@@ -1813,11 +1813,12 @@
 
 
 class GenericHaloFinder(HaloList, ParallelAnalysisInterface):
-    def __init__(self, ds, ds, dm_only=True, padding=0.0):
+    def __init__(self, ds, data_source, dm_only=True, padding=0.0):
         ParallelAnalysisInterface.__init__(self)
         self.ds = ds
         self.index = ds.index
-        self.center = (np.array(ds.right_edge) + np.array(ds.left_edge)) / 2.0
+        self.center = (np.array(data_source.right_edge) +
+                       np.array(data_source.left_edge)) / 2.0
 
     def _parse_halolist(self, threshold_adjustment):
         groups = []

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af yt/data_objects/data_containers.py
--- a/yt/data_objects/data_containers.py
+++ b/yt/data_objects/data_containers.py
@@ -114,6 +114,10 @@
             self.set_field_parameter(key, val)
 
     @property
+    def pf(self):
+        return getattr(self, 'ds', None)
+
+    @property
     def index(self):
         if self._index is not None:
             return self._index

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af yt/utilities/fits_image.py
--- a/yt/utilities/fits_image.py
+++ b/yt/utilities/fits_image.py
@@ -13,6 +13,7 @@
 import numpy as np
 from yt.funcs import mylog, iterable, fix_axis, ensure_list
 from yt.visualization.fixed_resolution import FixedResolutionBuffer
+from yt.visualization.plot_window import get_sanitized_center
 from yt.data_objects.construction_data_containers import YTCoveringGridBase
 from yt.utilities.on_demand_imports import _astropy
 from yt.units.yt_array import YTQuantity
@@ -293,19 +294,20 @@
         The axis of the slice. One of "x","y","z", or 0,1,2.
     fields : string or list of strings
         The fields to slice
-    coord : float, tuple, or YTQuantity
-        The coordinate of the slice along *axis*. Can be a (value,
-        unit) tuple, a YTQuantity, or a float. If a float, it will be
-        interpreted as in units of code_length.
+    center : A sequence floats, a string, or a tuple.
+         The coordinate of the center of the image. If set to 'c', 'center' or
+         left blank, the plot is centered on the middle of the domain. If set to
+         'max' or 'm', the center will be located at the maximum of the
+         ('gas', 'density') field. Units can be specified by passing in center
+         as a tuple containing a coordinate and string unit name or by passing
+         in a YTArray.  If a list or unitless array is supplied, code units are
+         assumed.
     """
-    def __init__(self, ds, axis, fields, coord, **kwargs):
+    def __init__(self, ds, axis, fields, center="c", **kwargs):
         fields = ensure_list(fields)
         axis = fix_axis(axis, ds)
-        if isinstance(coord, tuple):
-            coord = ds.quan(coord[0], coord[1]).in_units("code_length").value
-        elif isinstance(coord, YTQuantity):
-            coord = coord.in_units("code_length").value
-        slc = ds.slice(axis, coord, **kwargs)
+        center = get_sanitized_center(center, ds)
+        slc = ds.slice(axis, center[axis], **kwargs)
         w, frb = construct_image(slc)
         super(FITSSlice, self).__init__(frb, fields=fields, wcs=w)
         for i, field in enumerate(fields):

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af yt/utilities/grid_data_format/writer.py
--- a/yt/utilities/grid_data_format/writer.py
+++ b/yt/utilities/grid_data_format/writer.py
@@ -227,7 +227,7 @@
         if unit_name in dataset_units:
             value, units = dataset_units[unit_name]
         else:
-            attr = getattr(pf, unit_name)
+            attr = getattr(ds, unit_name)
             value = float(attr)
             units = str(attr.units)
         d = g.create_dataset(unit_name, data=value)

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af yt/visualization/plot_window.py
--- a/yt/visualization/plot_window.py
+++ b/yt/visualization/plot_window.py
@@ -807,10 +807,10 @@
                     pass
                 elif np.nanmax(image) == np.nanmin(image):
                     msg = "Plot image for field %s has zero dynamic " \
-                          "range. Min = Max = %d." % (f, np.nanmax(image))
+                          "range. Min = Max = %f." % (f, np.nanmax(image))
                 elif np.nanmax(image) <= 0:
                     msg = "Plot image for field %s has no positive " \
-                          "values.  Max = %d." % (f, np.nanmax(image))
+                          "values.  Max = %f." % (f, np.nanmax(image))
                 elif not np.any(np.isfinite(image)):
                     msg = "Plot image for field %s is filled with NaNs." % (f,)
                 if msg is not None:

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af yt/visualization/profile_plotter.py
--- a/yt/visualization/profile_plotter.py
+++ b/yt/visualization/profile_plotter.py
@@ -797,19 +797,19 @@
                     positive_values = data[data > 0.0]
                     if len(positive_values) == 0:
                         mylog.warning("Profiled field %s has no positive "
-                                      "values.  Max = %d." %
+                                      "values.  Max = %f." %
                                       (f, np.nanmax(data)))
                         mylog.warning("Switching to linear colorbar scaling.")
-                        zmin = data.min()
+                        zmin = np.nanmin(data)
                         z_scale = 'linear'
                         self._field_transform[f] = linear_transform
                     else:
                         zmin = positive_values.min()
                         self._field_transform[f] = log_transform
                 else:
-                    zmin = data.min()
+                    zmin = np.nanmin(data)
                     self._field_transform[f] = linear_transform
-                zlim = [zmin, data.max()]
+                zlim = [zmin, np.nanmax(data)]
 
             fp = self._font_properties
             f = self.profile.data_source._determine_fields(f)[0]

diff -r 2293f9d3e294ebdc65f6700a06103f847021100c -r 685c5a89e6e48c42df0ebbf10e6b21a21903d0af yt/visualization/volume_rendering/transfer_function_helper.py
--- a/yt/visualization/volume_rendering/transfer_function_helper.py
+++ b/yt/visualization/volume_rendering/transfer_function_helper.py
@@ -19,7 +19,6 @@
 from yt.visualization.volume_rendering.api import ColorTransferFunction
 from yt.visualization._mpl_imports import FigureCanvasAgg
 from matplotlib.figure import Figure
-from IPython.core.display import Image
 from yt.extern.six.moves import StringIO
 import numpy as np
 
@@ -190,6 +189,7 @@
         ax.set_ylim(y.max()*1.0e-3, y.max()*2)
 
         if fn is None:
+            from IPython.core.display import Image
             f = StringIO()
             canvas.print_figure(f)
             f.seek(0)


https://bitbucket.org/yt_analysis/yt/commits/7688ec7a09af/
Changeset:   7688ec7a09af
Branch:      yt-3.0
User:        chummels
Date:        2014-07-20 01:06:25
Summary:     Merged in ngoldbaum/yt/yt-3.0 (pull request #1039)

Updating the parallelism docs.
Affected #:  4 files

diff -r bcc3145074ddc908d41861917e277371edf62f65 -r 7688ec7a09af1bf76b46dd9a085071c85dbdec4f doc/source/analyzing/ionization_cube.py
--- a/doc/source/analyzing/ionization_cube.py
+++ b/doc/source/analyzing/ionization_cube.py
@@ -1,14 +1,17 @@
-from yt.mods import *
+import yt
 from yt.utilities.parallel_tools.parallel_analysis_interface \
     import communication_system
-import h5py, glob, time
 
- at derived_field(name = "IonizedHydrogen",
-               units = r"\frac{\rho_{HII}}{rho_H}")
+import h5py
+import time
+import numpy as np
+
+ at yt.derived_field(name="IonizedHydrogen", units="",
+                  display_name=r"\frac{\rho_{HII}}{\rho_H}")
 def IonizedHydrogen(field, data):
     return data["HII_Density"]/(data["HI_Density"]+data["HII_Density"])
 
-ts = DatasetSeries.from_filenames("SED800/DD*/*.index", parallel = 8)
+ts = yt.DatasetSeries("SED800/DD*/*.index", parallel=8)
 
 ionized_z = np.zeros(ts[0].domain_dimensions, dtype="float32")
 

diff -r bcc3145074ddc908d41861917e277371edf62f65 -r 7688ec7a09af1bf76b46dd9a085071c85dbdec4f doc/source/analyzing/parallel_computation.rst
--- a/doc/source/analyzing/parallel_computation.rst
+++ b/doc/source/analyzing/parallel_computation.rst
@@ -22,14 +22,11 @@
    :ref:`derived-quantities`)
  * 1-, 2-, and 3-D profiles (:ref:`generating-profiles-and-histograms`)
  * Halo finding (:ref:`halo_finding`)
- * Merger tree (:ref:`merger_tree`)
- * Two point functions (:ref:`two_point_functions`)
  * Volume rendering (:ref:`volume_rendering`)
- * Radial column density (:ref:`radial-column-density`)
  * Isocontours & flux calculations (:ref:`extracting-isocontour-information`)
 
 This list covers just about every action ``yt`` can take!  Additionally, almost all
-scripts will benefit from parallelization without any modification.  The goal
+scripts will benefit from parallelization with minimal modification.  The goal
 of Parallel-``yt`` has been to retain API compatibility and abstract all
 parallelism.
 
@@ -45,14 +42,15 @@
 
     $ pip install mpi4py
 
-Once that has been installed, you're all done!  You just need to launch your 
-scripts with ``mpirun`` (or equivalent) and signal to ``yt`` that you want to run 
-them in parallel.  In general, that's all it takes to get a speed benefit on a 
+Once that has been installed, you're all done!  You just need to launch your
+scripts with ``mpirun`` (or equivalent) and signal to ``yt`` that you want to
+run them in parallel by invoking the ``yt.enable_parallelism()`` function in
+your script.  In general, that's all it takes to get a speed benefit on a
 multi-core machine.  Here is an example on an 8-core desktop:
 
 .. code-block:: bash
 
-    $ mpirun -np 8 python script.py --parallel
+    $ mpirun -np 8 python script.py
 
 Throughout its normal operation, ``yt`` keeps you aware of what is happening with
 regular messages to the stderr usually prefaced with: 
@@ -71,10 +69,9 @@
 
 in the case of two cores being used.
 
-It's important to note that all of the processes listed in `capabilities` work
--- and no additional work is necessary to parallelize those processes.
-Furthermore, the ``yt`` command itself recognizes the ``--parallel`` option, so
-those commands will work in parallel as well.
+It's important to note that all of the processes listed in :ref:`capabilities`
+work in parallel -- and no additional work is necessary to parallelize those
+processes.
 
 Running a ``yt`` script in parallel
 -----------------------------------
@@ -85,11 +82,12 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ds = load("RD0035/RedshiftOutput0035")
+   import yt
+   yt.enable_parallelism()
+   ds = yt.load("RD0035/RedshiftOutput0035")
    v, c = ds.find_max("density")
    print v, c
-   p = ProjectionPlot(ds, "x", "density")
+   p = yt.ProjectionPlot(ds, "x", "density")
    p.save()
 
 If this script is run in parallel, two of the most expensive operations -
@@ -99,7 +97,7 @@
 
 .. code-block:: bash
 
-   $ mpirun -np 16 python2.7 my_script.py --parallel
+   $ mpirun -np 16 python2.7 my_script.py
 
 .. note::
 
@@ -126,11 +124,11 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ds = load("RD0035/RedshiftOutput0035")
+   import yt
+   ds = yt.load("RD0035/RedshiftOutput0035")
    v, c = ds.find_max("density")
-   p = ProjectionPlot(ds, "x", "density")
-   if is_root():
+   p = yt.ProjectionPlot(ds, "x", "density")
+   if yt.is_root():
        print v, c
        p.save()
 
@@ -144,17 +142,17 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
+   import yt
 
    def print_and_save_plot(v, c, plot, print=True):
        if print:
           print v, c
        plot.save()
 
-   ds = load("RD0035/RedshiftOutput0035")
+   ds = yt.load("RD0035/RedshiftOutput0035")
    v, c = ds.find_max("density")
-   p = ProjectionPlot(ds, "x", "density")
-   only_on_root(print_and_save_plot, v, c, plot, print=True)
+   p = yt.ProjectionPlot(ds, "x", "density")
+   yt.only_on_root(print_and_save_plot, v, c, plot, print=True)
 
 Types of Parallelism
 --------------------
@@ -174,19 +172,17 @@
 The following operations use spatial decomposition:
 
   * Halo finding
-  * Merger tree
-  * Two point functions
   * Volume rendering
-  * Radial column density
 
 Grid Decomposition
 ++++++++++++++++++
 
-The alternative to spatial decomposition is a simple round-robin of the grids.
-This process allows ``yt`` to pool data access to a given Enzo data file, which
-ultimately results in faster read times and better parallelism.
+The alternative to spatial decomposition is a simple round-robin of data chunks,
+which could be grids, octs, or whatever chunking mechanism is used by the code
+frontend begin used.  This process allows ``yt`` to pool data access to a given
+data file, which ultimately results in faster read times and better parallelism.
 
-The following operations use grid decomposition:
+The following operations use chunk decomposition:
 
   * Projections
   * Slices
@@ -211,16 +207,15 @@
 ---------------------------
 
 It is easy within ``yt`` to parallelize a list of tasks, as long as those tasks
-are independent of one another.
-Using object-based parallelism, the function :func:`parallel_objects` will
-automatically split up a list of tasks over the specified number of processors
-(or cores).
-Please see this heavily-commented example:
+are independent of one another. Using object-based parallelism, the function
+:func:`~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_objects`
+will automatically split up a list of tasks over the specified number of
+processors (or cores).  Please see this heavily-commented example:
 
 .. code-block:: python
    
    # As always...
-   from yt.mods import *
+   import yt
    
    import glob
    
@@ -249,19 +244,19 @@
    # If data does not need to be combined after the loop is done, the line
    # would look like:
    #       for fn in parallel_objects(fns, num_procs):
-   for sto, fn in parallel_objects(fns, num_procs, storage = my_storage):
+   for sto, fn in yt.parallel_objects(fns, num_procs, storage = my_storage):
 
        # Open a data file, remembering that fn is different on each task.
-       ds = load(fn)
+       ds = yt.load(fn)
        dd = ds.all_data()
 
        # This copies fn and the min/max of density to the local copy of
        # my_storage
        sto.result_id = fn
-       sto.result = dd.quantities["Extrema"]("density")
+       sto.result = dd.quantities.extrema("density")
 
        # Makes and saves a plot of the gas density.
-       p = ProjectionPlot(ds, "x", "density")
+       p = yt.ProjectionPlot(ds, "x", "density")
        p.save()
 
    # At this point, as the loop exits, the local copies of my_storage are
@@ -270,7 +265,7 @@
    # tasks have produced.
    # Below, the values in my_storage are printed by only one task. The other
    # tasks do nothing.
-   if is_root()
+   if yt.is_root()
        for fn, vals in sorted(my_storage.items()):
            print fn, vals
 
@@ -282,43 +277,33 @@
 Parallel Time Series Analysis
 -----------------------------
 
-The same :func:`parallel_objects` machinery discussed above is turned on by
-default when using a ``DatasetSeries`` object (see :ref:`time-series-analysis`)
-to iterate over simulation outputs.  The syntax for this is very simple.  As an
-example, we can use the following script to find the angular momentum vector in
-a 1 pc sphere centered on the maximum density cell in a large number of
-simulation outputs:
+The same ``parallel_objects`` machinery discussed above is turned on by
+default when using a :class:`~yt.data_objects.time_series.DatasetSeries` object
+(see :ref:`time-series-analysis`) to iterate over simulation outputs.  The
+syntax for this is very simple.  As an example, we can use the following script
+to find the angular momentum vector in a 1 pc sphere centered on the maximum
+density cell in a large number of simulation outputs:
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ts = DatasetSeries.from_filenames("DD*/output_*", parallel = True)
-   sphere = ts.sphere("max", (1.0, "pc"))
-   L_vecs = sphere.quantities["AngularMomentumVector"]()
+   import yt
+   yt.enable_parallelism()
+
+   ts = yt.load("DD*/output_*")
+
+   storage = {}
+
+   for sto, ds in ts.piter(storage=storage):
+       sphere = ds.sphere("max", (1.0, "pc"))
+       sto.result = sphere.quantities.angular_momentum_vector()
+       sto.result_id = str(ds)
+
+   for L in sorted(storage.items()):
+       print L
 
 Note that this script can be run in serial or parallel with an arbitrary number
 of processors.  When running in parallel, each output is given to a different
-processor.  By default, parallel is set to ``True``, so you do not have to
-explicitly set ``parallel = True`` as in the above example. 
-
-One could get the same effect by iterating over the individual datasets
-in the DatasetSeries object:
-
-.. code-block:: python
-
-   from yt.pmods import *
-   ts = DatasetSeries.from_filenames("DD*/output_*", parallel = True)
-   my_storage = {}
-   for sto,ds in ts.piter(storage=my_storage):
-       sphere = ds.sphere("max", (1.0, "pc"))
-       L_vec = sphere.quantities["AngularMomentumVector"]()
-       sto.result_id = ds.parameter_filename
-       sto.result = L_vec
-
-   L_vecs = []
-   for fn, L_vec in sorted(my_storage.items()):
-       L_vecs.append(L_vec)
-
+processor.
 
 You can also request a fixed number of processors to calculate each
 angular momentum vector.  For example, this script will calculate each angular
@@ -328,16 +313,18 @@
 
 .. code-block:: python
 
-   from yt.pmods import *
-   ts = DatasetSeries.from_filenames("DD*/output_*", parallel = 4)
-   sphere = ts.sphere("max", (1.0, "pc))
-   L_vecs = sphere.quantities["AngularMomentumVector"]()
+   import yt
+   ts = yt.DatasetSeries("DD*/output_*", parallel = 4)
+   
+   for ds in ts.piter():
+       sphere = ds.sphere("max", (1.0, "pc))
+       L_vecs = sphere.quantities.angular_momentum_vector()
 
 If you do not want to use ``parallel_objects`` parallelism when using a
-TimeSeries object, set ``parallel = False``.  When running python in parallel,
+DatasetSeries object, set ``parallel = False``.  When running python in parallel,
 this will use all of the available processors to evaluate the requested
 operation on each simulation output.  Some care and possibly trial and error
-might be necessary to estimate the correct settings for your Simulation
+might be necessary to estimate the correct settings for your simulation
 outputs.
 
 Parallel Performance, Resources, and Tuning
@@ -350,53 +337,50 @@
 provide some insight into what are good starting values for a given parallel
 task.
 
-Grid Decomposition
-++++++++++++++++++
+Chunk Decomposition
++++++++++++++++++++
 
 In general, these types of parallel calculations scale very well with number of
-processors.
-They are also fairly memory-conservative.
-The two limiting factors is therefore the number of grids in the dataset,
-and the speed of the disk the data is stored on.
-There is no point in running a parallel job of this kind with more processors
-than grids, because the extra processors will do absolutely nothing, and will
-in fact probably just serve to slow down the whole calculation due to the extra
-overhead.
-The speed of the disk is also a consideration - if it is not a high-end parallel
-file system, adding more tasks will not speed up the calculation if the disk
-is already swamped with activity.
+processors.  They are also fairly memory-conservative.  The two limiting factors
+is therefore the number of chunks in the dataset, and the speed of the disk the
+data is stored on.  There is no point in running a parallel job of this kind
+with more processors than chunks, because the extra processors will do absolutely
+nothing, and will in fact probably just serve to slow down the whole calculation
+due to the extra overhead.  The speed of the disk is also a consideration - if
+it is not a high-end parallel file system, adding more tasks will not speed up
+the calculation if the disk is already swamped with activity.
 
-The best advice for these sort of calculations is to run 
-with just a few processors and go from there, seeing if it the runtime
-improves noticeably.
+The best advice for these sort of calculations is to run with just a few
+processors and go from there, seeing if it the runtime improves noticeably.
 
 **Projections, Slices, and Cutting Planes**
 
 Projections, slices and cutting planes are the most common methods of creating
 two-dimensional representations of data.  All three have been parallelized in a
-grid-based fashion.
+chunk-based fashion.
 
  * Projections: projections are parallelized utilizing a quad-tree approach.
    Data is loaded for each processor, typically by a process that consolidates
-   open/close/read operations, and each grid is then iterated over and cells
-   are deposited into a data structure that stores values corresponding to
-   positions in the two-dimensional plane.  This provides excellent load
-   balancing, and in serial is quite fast.  However, as of ``yt`` 2.3, the
-   operation by which quadtrees are joined across processors scales poorly;
-   while memory consumption scales well, the time to completion does not.  As
-   such, projections can often be done very fast when operating only on a single
-   processor!  The quadtree algorithm can be used inline (and, indeed, it is
-   for this reason that it is slow.)  It is recommended that you attempt to
-   project in serial before projecting in parallel; even for the very largest
-   datasets (Enzo 1024^3 root grid with 7 levels of refinement) in the absence
-   of IO the quadtree algorithm takes only three minutes or so on a decent
-   processor.
- * Slices: to generate a slice, grids that intersect a given slice are iterated
-   over and their finest-resolution cells are deposited.  The grids are
+   open/close/read operations, and each grid is then iterated over and cells are
+   deposited into a data structure that stores values corresponding to positions
+   in the two-dimensional plane.  This provides excellent load balancing, and in
+   serial is quite fast.  However, the operation by which quadtrees are joined
+   across processors scales poorly; while memory consumption scales well, the
+   time to completion does not.  As such, projections can often be done very
+   fast when operating only on a single processor!  The quadtree algorithm can
+   be used inline (and, indeed, it is for this reason that it is slow.)  It is
+   recommended that you attempt to project in serial before projecting in
+   parallel; even for the very largest datasets (Enzo 1024^3 root grid with 7
+   levels of refinement) in the absence of IO the quadtree algorithm takes only
+   three minutes or so on a decent processor.
+
+ * Slices: to generate a slice, chunks that intersect a given slice are iterated
+   over and their finest-resolution cells are deposited.  The chunks are
    decomposed via standard load balancing.  While this operation is parallel,
    **it is almost never necessary to slice a dataset in parallel**, as all data is
    loaded on demand anyway.  The slice operation has been parallelized so as to
    enable slicing when running *in situ*.
+
  * Cutting planes: cutting planes are parallelized exactly as slices are.
    However, in contrast to slices, because the data-selection operation can be
    much more time consuming, cutting planes often benefit from parallelism.
@@ -404,7 +388,7 @@
 Object-Based
 ++++++++++++
 
-Like grid decomposition, it does not help to run with more processors than the
+Like chunk decomposition, it does not help to run with more processors than the
 number of objects to be iterated over.
 There is also the matter of the kind of work being done on each object, and
 whether it is disk-intensive, cpu-intensive, or memory-intensive.
@@ -436,37 +420,28 @@
 
 **Halo-Finding**
 
-Halo finding, along with the merger tree that uses halo finding, operates
-on the particles in the volume, and is therefore mostly grid-agnostic.
-Generally, the biggest concern for halo finding is the amount of memory needed.
-There is subtle art in estimating the amount of memory needed for halo finding,
-but a rule of thumb is that Parallel HOP (:func:`parallelHF`) is the most
-memory-intensive, followed by plain HOP (:func:`HaloFinder`),
-with Friends of Friends (:func:`FOFHaloFinder`) being
-the most memory-conservative.
-It has been found that :func:`parallelHF` needs roughly
-1 MB of memory per 5,000
-particles, although recent work has improved this and the memory requirement
-is now smaller than this. But this is a good starting point for beginning to
-calculate the memory required for halo-finding.
-
-**Two point functions**
-
-Please see :ref:`tpf_strategies` for more details.
+Halo finding, along with the merger tree that uses halo finding, operates on the
+particles in the volume, and is therefore mostly chunk-agnostic.  Generally, the
+biggest concern for halo finding is the amount of memory needed.  There is
+subtle art in estimating the amount of memory needed for halo finding, but a
+rule of thumb is that the HOP halo finder is the most memory intensive
+(:func:`HaloFinder`), and Friends of Friends (:func:`FOFHaloFinder`) being the
+most memory-conservative.  It has been found that :func:`parallelHF` needs
+roughly 1 MB of memory per 5,000 particles, although recent work has improved
+this and the memory requirement is now smaller than this. But this is a good
+starting point for beginning to calculate the memory required for halo-finding.
 
 **Volume Rendering**
 
-The simplest way to think about volume rendering, and the radial column density
-module that uses it, is that it load-balances over the grids in the dataset.
-Each processor is given roughly the same sized volume to operate on.
-In practice, there are just a few things to keep in mind when doing volume
-rendering.
-First, it only uses a power of two number of processors.
-If the job is run with 100 processors, only 64 of them will actually do anything.
-Second, the absolute maximum number of processors is the number of grids.
-But in order to keep work distributed evenly, typically the number of processors
-should be no greater than one-eighth or one-quarter the number of processors
-that were used to produce the dataset.
+The simplest way to think about volume rendering, is that it load-balances over
+the i/o chunks in the dataset.  Each processor is given roughly the same sized
+volume to operate on.  In practice, there are just a few things to keep in mind
+when doing volume rendering.  First, it only uses a power of two number of
+processors.  If the job is run with 100 processors, only 64 of them will
+actually do anything.  Second, the absolute maximum number of processors is the
+number of chunks.  In order to keep work distributed evenly, typically the
+number of processors should be no greater than one-eighth or one-quarter the
+number of processors that were used to produce the dataset.
 
 Additional Tips
 ---------------
@@ -500,10 +475,10 @@
     
     .. code-block:: python
     
-       from yt.mods import *
+       import yt
        import time
-       
-       ds = load("DD0152")
+
+       ds = yt.load("DD0152")
        t0 = time.time()
        bigstuff, hugestuff = StuffFinder(ds)
        BigHugeStuffParallelFunction(ds, bigstuff, hugestuff)
@@ -514,7 +489,7 @@
            SaveTinyMiniStuffToDisk("out%06d.txt" % i, array)
        t2 = time.time()
        
-       if is_root()
+       if yt.is_root()
            print "BigStuff took %.5e sec, TinyStuff took %.5e sec" % (t1 - t0, t2 - t1)
   
   * Remember that if the script handles disk IO explicitly, and does not use
@@ -526,7 +501,7 @@
     
     .. code-block:: python
        
-       if is_root()
+       if yt.is_root()
            file = open("out.txt", "w")
            file.write(stuff)
            file.close()

diff -r bcc3145074ddc908d41861917e277371edf62f65 -r 7688ec7a09af1bf76b46dd9a085071c85dbdec4f doc/source/reference/api/api.rst
--- a/doc/source/reference/api/api.rst
+++ b/doc/source/reference/api/api.rst
@@ -698,6 +698,7 @@
    ~yt.funcs.time_execution
    ~yt.analysis_modules.level_sets.contour_finder.identify_contours
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_blocking_call
+   ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_objects
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_passthrough
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_root_only
    ~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_simple_proxy

diff -r bcc3145074ddc908d41861917e277371edf62f65 -r 7688ec7a09af1bf76b46dd9a085071c85dbdec4f doc/source/yt3differences.rst
--- a/doc/source/yt3differences.rst
+++ b/doc/source/yt3differences.rst
@@ -21,6 +21,9 @@
 
 Here's a quick reference for how to update your code to work with yt-3.0.
 
+  * Importing yt is now as simple as ``import yt``.  The docs have been
+    extensively updated to reflect this new style.  ``from yt.mods import *``
+    still works, but we are discouraging its use going forward.
   * Fields can be accessed by a name, but are named internally as ``(fluid_type,
     fluid_name)``.
   * Fields on-disk will be in code units, and will be named ``(code_name,
@@ -36,6 +39,11 @@
     return a single tuple if you only ask for one field.
   * Units can be tricky, and they try to keep you from making weird things like
     ``ergs`` + ``g``.  See :ref:`units` for more information.
+  * Previously, yt would capture command line arguments when being imported.
+    This no longer happens.  As a side effect, it is no longer necessary to
+    specify ``--parallel`` at the command line when running a parallel 
+    computation. Use ``yt.enable_parallelism()`` instead.  See 
+    :ref:`parallel-computation` for more detail.
 
 Cool New Things
 ---------------

Repository URL: https://bitbucket.org/yt_analysis/yt/

--

This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.