[Yt-dev] Field definitions, derived fields, whats-in-a-file and the "deliberate_fields" branch

Matthew Turk matthewturk at gmail.com
Wed Nov 9 09:05:30 PST 2011


Hi all,

Over the last couple months, Casey and I have been working -- on and
off! -- on a new branch of the code called "deliberate_fields."  This
branch will change, in a substantial but easy-to-update way, how
fields are handled in yt.

I recognize this email is long.  But if you use non-standard fields, a
bunch of derived fields, unit modifications, any of that, it may
affect you.  So I *please* ask that you read it and, if you like,
contribute back to the discussion.

This is one of the items I really want to have done for a hypothetical
2.3 release.

= Background =

The way fields work currently was designed a bit haphazardly.  They
use FieldInfoContainers, objects which share state and which contain
unions of the known derived fields and the known IO-based fields.  One
of the problems with this is that the only thing that separates a
derived field from a known field is that function that generates the
field: the IO-based fields all use a lambda which returns None, and
the non-IO based fields return actual fields.  This is pretty
sub-optimal, and it actually lands us in trouble when (for instance)
we have fields wandering around named things like "Thermal_Energy" and
"ThermalEnergy"; the mechanism by which one is selected and the other
not is problematic, and to get around infinite recursion, hacks have
had to be applied.

As it stands, to find a field, the shared-state "field info" on a
parameter file is queried; this then will try to check universal
fields.  But because of how the fields are stored, the field info
cascade can also operate in reverse.  The big problem is that the
field selection mechanism doesn't seem to have a bus factor >= 1.0.
And, it has a number of hacks to make it work with conflicting field
definitiosn and the like.

Unfortunately, layering these hacks on top of each other makes it much
harder for other codes to be supported; translations are not reliable,
and sometimes cause too many levels of recursion to be added.
Something simpler is necessary.

= What this does =

Essentially, this creates multi-level, explicit fallbacks.  The field
info container, which was a bloated, weird shared state object, is now
simply a dictionary subclass with a "fallback" option.  When you
create them, you can either create it in isolation (with no fallback)
or with a fallback.  When you query it, if it does not have a field,
it checks its fallback.  There are, additionally, two new functions
for IO: the translation function and the null function.  The first is
to translate, for instance, "density" to "Density" and the second is
to indicate that a field is expected to be found in an output from the
simulation code.

There are now affiliated with each simulation code two field info
objects: the "known" fields, which may appear in files, and the
non-known (i.e., code-specific derived) fields.  These live as the
attributes _fieldinfo_fallback and _fieldinfo_known on the
StaticOutput sublcass corresponding to a simulation code.  When the
Hierarchy (not static output) is instantiated, the first step is to
create a new field_info object.  This has, as a fallback, the
_fieldinfo_fallback, which itself has as a fallback the
universally-known derived fields.  The hierarchy then queries the
output file for which fields are available.  This process then looks
for a corresponding field in fieldinfo_known, and if it finds it, it
adds it to the field_info object, *overriding* any possible derived
fields.  (In this manner, for instance, yt will not recalculate a
"CoolingTime" field if one exists in the output.)

= What it aims to do in the future =

This will be utilized in three main ways:

1) Making it more clear which fields belong to which code, and which
come from disk and which are derived
2) Help move IO into fields, to optimize for geometries and data containers
3) Make units more clear and specific
4) This is all designed around better supporting the GDF.

= Where from here? =

It would be hugely beneficial if you could test this and report back.
I have created a pull request:

https://bitbucket.org/yt_analysis/yt/pull-request/27/field-overhaul-to-utilize-explicit

This is by no means a settled matter; I think we need to have testing
on this, buy-in from developers and users, and to make sure that old
code doesn't beak.  The test cases all pass for me for Enzo.

Before this can be merged, I would hope we can get some testing from:

 * Enzo
 * Nyx
 * FLASH
 * Orion

and any other codes that can hear me.

Thanks very much for your time; please let me know if you have any
questions, concerns, jokes, comments, improvements, CDs of your band,
suggestions, and so on.  For this major of a change I'd like to keep
discussion on list, so the record of this is a bit more prominent.

Best,

Matt



More information about the yt-dev mailing list