[yt-dev] Interacting with data in yt 3.0 (was Field units from code to code)

Matthew Turk matthewturk at gmail.com
Fri Mar 30 03:48:32 PDT 2012

In general, I agree with the idea Nathan put out.  (Also, I think this
is a fine time to have a bikeshed discussion.  Many of the underlying
assumptions about how yt works were laid out a long time ago.)  But,
I'm not entirely sure I understand how different it would be --
conceptually, yes, I see what you're getting at, that we'd have a set
number of attributes.  In what I was thinking of for the geometry
refactor so far I'm trying to get rid of the "hierarchy" as existing
for every data set, and instead relying on what amounts to an
object-finder and io-coordinator, which I'm calling a geometry
handler.  It sounds like what you would like is:

1) Get rid of accessing parameters with an implicit __getitem__ on the
parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]).  I'm
+10 on this.
2) Move units into the .units object (I'm mostly with Casey on this,
but I think it should be a part of the field_info object)
3) Have things like current_time, domain_dimensions and so on move
into basic_info and make them dict objects.

I think of those, I'm in favor of one and two, but somewhat opposed to
#3.  Right now we have these attributes mandated for subclasses of


The only ones here that I think would be okay to move out of
properties would be the cosmology items, and even those I'm -0 on

But, in general, the idea of moving from this two-stage system of
parameter file (rather than dataset) and hierarchy (rather than an
implicitly-handled geometry) is something I am in support of.  The
geometry is something that should nearly *always* be handled by the
backend, rather than by the user.  So having the library require
pf.h.sphere(...) is less than ideal, since it's exposing something
relatively unfortunate (that building a hundred thousand grid objects
can take some time).

The main ways that the static output is interacted with:

* Parameter information specific to a simulation code
* Properties that yt needs to know about
* To get at the hierarchy
* Input to plot collections

The main ways that the hierarchy is interacted with:

* Getting data objects
* Finding max
* Statistics about the simulation
* Inspecting individual grids (much less common use case now that it was before)

All of these use cases are still valid, but I think it's clear that
accessing individual grids and accessing simulation-specific
parameters are not "generic" functions.  What a lot of this discussion
has really brought up for me is that we're talking about *generic*
functionality, not code-specific functionality, and we right now do
not have the best enumeration of functionality and where it lies.

With the geometry_refactor, I'd like to consolidate functionality into
the main "dataset" object.  The geometry can still provide access to
the individual grids (of course) but data objects, finding max,
getting stats about the simulation, etc, should all go into the main
dataset object, and the geometry handler can simply be created on the
fly if necessary.

This brings up two points, though --

1) Does our method of instantiating objects still hold up?  i.e.,
ds.sphere(...) and so on?  Or does our dataset object then become
overcrowded?  I would also like to move *all* plotting objects into
whatever we end up deciding is the location data containers come from,
which for instance could look like ds.plot("slice", "x") (for
instance, although we can bikeshed that later), which would return a
plot window.
2) Datasets and time series should behave, if not identically, at
least consistently in their APIs.  Moving to a completely ds-mediated
mechanism for generating, accessing and inspecting data opens up the
ability to then construct very nice and simply proxy objects.  As an
example, while something this is currently technically possible with
the current Time Series API, it's a bit tricky:

ts = TimeSeriesData.from_filenames(...)
plot = ts.plot("slice", "x", (100.0, 'au'))
ts.seek(dt = (100, 'years'))
ts.seek(dt = (10, 'years'))

(The time-slider, as Tom likes to call it ...)

In general, this idea of moving toward more thoughtful
dataset-construction, rather than the hokey parameter file + hierarchy
construction brings with it a mindset shift which I'd like to spread
to the time series, which can continue to be a focus.

What do you think?


On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark <caseywstark at gmail.com> wrote:
> +1 on datasets, although I would like to see the unit object(s) at the field
> level.
> On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels
> <chummels at astro.columbia.edu> wrote:
>> +1 on datasets.
>> On 3/29/12 6:58 PM, Nathan Goldbaum wrote:
>>> +1.  I'd also be up to help out with the sprint.  Doing a virtual sprint
>>> using a google hangout might help mitigate some of the distance problems.
>>> While we're brining up Enzo-isms that we should get rid of, I think it
>>> might be a good idea to make a conceptual shift in the basic python UI.
>>>  Instead referring to the interface between the user and the data as a
>>> parameter file, I think instead we should be talking about datasets.  One
>>> would instantiate a dataset just like we do now with parameter files:
>>> ds = load(filename)
>>> A dataset would also have some universal attributes which would present
>>> themselves to the user as a dict, e.g. ds.units, ds.parameters,
>>> ds.basic_info (like current_time, timestep, filename, and simulation code),
>>> and ds.hierarchy (not sure how that would interfere with the geometry
>>> refactor).
>>> This may be a paintibg the bike shed discussion, but I think this shift
>>> will help new users understand how to access their data.  Thoughts?
>>> On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk at gmail.com>  wrote:
>>>> Hi Nathan and Casey,
>>>> I agree with what both of you have said.  The Orion/Nyx units should
>>>> be made to be consistent, but more importantly I think we should
>>>> continue breaking away from Enzo-isms in the code.
>>>> As it stands, all of the universal fields call underlying Enzo-named
>>>> aliases -- Density, ThermalEnergy, etc etc.  I hope we can have a 3.0
>>>> out within a calendar year, hopefully by the end of this year.  (I've
>>>> been pushing on the geometry refactor, although recently other efforts
>>>> have been paying off which has decreased my output there.)  I am much,
>>>> much less doubtful than Casey is that we cannot do this; in fact, I'm
>>>> completely in favor of this and I think it would be relatively
>>>> straightforward to implement.
>>>> In the existing system we have a mechanism for aliasing fields.  What
>>>> we can do is provide an additional translation system where we
>>>> enumerate the fields that are available for items in UniversalFields,
>>>> and then construct aliases to those.  This would mean changing what is
>>>> aliased in existing non-Enzo frontends, and adding aliases in Enzo.
>>>> The style of name Casey proposes is what I woudl also agree with:
>>>> underscores, lower cases, and erring on the side of verbosity.  The
>>>> fields off hand that we would need to do this for (in their current
>>>> enzo-isms):
>>>> x-velocity =>  velocity_x (same for y, z)
>>>> Density =>  density
>>>> TotalEnergy =>  ?
>>>> GasEnergy =>  thermal_energy_specific (and thermal_energy_density)
>>>> Temperature =>  temperature
>>>> and so on.
>>>> Once we have these aliases in place, an overall cleanup of
>>>> UniversalFields should take place.  One place we should clean up is
>>>> ensuring that there are no conditionals; rather than conditionals
>>>> inside the functions, we should place those conditionals inside the
>>>> parameter file types.  So for instance, if you have a field that is
>>>> calculated differently depending on the parameter HydroMethod (in Enzo
>>>> for instance) you simply set a validator on the field requiring the
>>>> parameter be set to a particular value, and then only the field which
>>>> satisfies that validator will be called when requested.
>>>> So we've gotten rid of a bunch of enzo-isms in the parameter files;
>>>> after fields, what else can we address?  And, I'd be up for sprinting
>>>> on this (which should take just a few hours) basically any time next
>>>> week or after.  I'd also be up for talking more about geometry
>>>> refactoring, if anyone is interested, but it's not quite to the point
>>>> that I think I am satisfied enough with the architecture to request
>>>> input / contributions.  Sometimes (especially with big architectural
>>>> things like this) I think it's a shame we do all of our work
>>>> virtually, as I think a lot of this would be easier to bang out in
>>>> person for a couple hours.
>>>> -Matt
>>>> On Wed, Mar 28, 2012 at 6:14 PM, Casey W. Stark<caseywstark at gmail.com>
>>>>  wrote:
>>>>> Hi Nathan.
>>>>> I'm also worried about this and I agree that fields with the same name
>>>>> should all be consistent. I would support some sort of cleanup of
>>>>> frontend
>>>>> fields, and I can get the Nyx fields in line and help with Enzo.
>>>>> I doubt we can do this, but I would prefer changing the field names as
>>>>> part
>>>>> of the removing enzo-isms and geometry handling refactoring pushes. For
>>>>> instance, the field in Orion could be thermal_energy_density and the
>>>>> field
>>>>> in Enzo could be specific_thermal_energy. I also noticed this issue
>>>>> when I
>>>>> was using "Density" in Enzo (proper density in cgs) and "density" in
>>>>> Nyx
>>>>> (comoving density in cgs).
>>>>> Best,
>>>>> Casey
>>>>> On Wed, Mar 28, 2012 at 1:47 PM, Nathan Goldbaum<goldbaum at ucolick.org>
>>>>> wrote:
>>>>>> Hi all,
>>>>>> On IRC today we noticed that Orion defines its ThermalEnergy field per
>>>>>> unit volume but Enzo and FLASH define ThermalEnergy per unit mass.  Is
>>>>>> this
>>>>>> a problem?  Since yt defaults to the Enzo field names, should we try
>>>>>> to make
>>>>>> sure that all fields are defined using the same units as in Enzo?  Is
>>>>>> there
>>>>>> a convention for how different codes should define derived fields that
>>>>>> are
>>>>>> aliased to Enzo fields?
>>>>>> One problem for this particular example is that the Pressure field is
>>>>>> defined in terms of ThermalEnergy in universal_fields.py so the units
>>>>>> of
>>>>>> ThermalEnergy become important if a user merely wants the gas pressure
>>>>>> in
>>>>>> the simulation.
>>>>>> One possible solution for this issue would be the units overhaul we're
>>>>>> planning. If all fields are associated with a unit object, we can
>>>>>> simply
>>>>>> query the units to ensure that units are taken care of correctly and
>>>>>> code-to-code comparisons aren't sensitive to the units chosen for
>>>>>> fields in
>>>>>> the frontend.
>>>>>> Personally, I think it would be best if we could make sure that all of
>>>>>> the
>>>>>> fields aliased to Enzo fields have the same units.
>>>>>> Nathan Goldbaum
>>>>>> Graduate Student
>>>>>> Astronomy&  Astrophysics, UCSC
>>>>>> goldbaum at ucolick.org
>>>>>> http://www.ucolick.org/~goldbaum
>>>>>> _______________________________________________
>>>>>> yt-dev mailing list
>>>>>> yt-dev at lists.spacepope.org
>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>>> _______________________________________________
>>>>> yt-dev mailing list
>>>>> yt-dev at lists.spacepope.org
>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>> _______________________________________________
>>>> yt-dev mailing list
>>>> yt-dev at lists.spacepope.org
>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>> !DSPAM:10175,4f74e5073356450621218!
>>> _______________________________________________
>>> yt-dev mailing list
>>> yt-dev at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

More information about the yt-dev mailing list