[yt-dev] Issue #743: Derived quantities are difficult to make unit aware (yt_analysis/yt)

Thu Dec 5 09:17:31 PST 2013

New issue 743: Derived quantities are difficult to make unit aware
https://bitbucket.org/yt_analysis/yt/issue/743/derived-quantities-are-difficult-to-make

Matthew Turk:

Derived quantities at present do not know anything about the units they are to return.  This presents something of a problem, as we perform several operations on them that are challenging to do in a neutral fashion; they can return scalars as well as arrays, and either or both might be unit-ful.

Here is the logic for how they are defined:

```
#!python
        retvals = [ [] for i in range(self.n_ret)]
        chunks = self._data_source.chunks([], chunking_style="io")
        for ds in parallel_objects(chunks, -1):
            rv = self.func(ds, *args, **kwargs)
            if not iterable(rv): rv = (rv,)
            for i in range(self.n_ret): retvals[i].append(rv[i])
        # Note that we do some fancy footwork here.
        # _par_combine_object and its affiliated alltoall function
        # assume that the *long* axis is the last one.  However,
        # our long axis is the first one!
        rv = []
        for my_list in retvals:
            data = YTArray(my_list).transpose()
            rv.append(self.comm.par_combine_object(data,
                        datatype="array", op="cat").transpose())
        retvals = rv
        return self.c_func(self._data_source, *retvals)
```

Now, there are only a relatively small number of derived quantities, and some likely have to be rewritten to take advantage of the particle type and fluid type systems.  Additionally, we wanted to add them "on demand" similar to fluid and particle fields.  So I see a few different options.

1. Rewrite them to be unit-aware.  This could also include fixes for how reduction operations are set up and an explicit noting of how many return values are to be expected.
2. Sanitize all arrays, record the units they are transformed into (they must be in the same units when viewed as ndarrays, or else we lose unitful protections), and then re-cast after passing between processors.
3. Hack around it somehow.

I think that the first option is likely the best.  We can redefine the derived quantity system to behave more like the new field system, and we can also take the opportunity to make it more explicit.  Many problems we've had in the past have been related to `n_ret` being unclear and whatnot.

I will first attempt to do 2, then 3, then 1.  I think 1 is where we will end up, however.

The derived quantities currently in the code:

* TotalMass
* CenterOfMass
* WeightedAverageQuantity
* WeightedVariance
* BulkVelocity
* AngularMomentumVector
* StarAngularMomentumVector
* ParticleAngularMomentumVector
* BaryonSpinParameter
* ParticleSpinParameter
* IsBound
* Extrema
* Action
* MaxLocation
* MinLocation
* TotalQuantity
* ParticleDensityCenter
* HalfMass

I think several of these should be removed or rewritten to merge them.  For instance, we can join the various spin parameter computations, potentially the variance and averaging, and possibly simplify the TotalMass computation as opposed to the TotalQuantity.  We may be able to alias WeightedAverage and CenterOfMass as well.

Responsible: MatthewTurk