[Yt-dev] Field definitions, derived fields, whats-in-a-file and the "deliberate_fields" branch

Wed Nov 9 12:25:55 PST 2011

Hi Matt,

This sounds like a much needed overhaul. However, I'm not quite clear
on exactly what this will entail, or how it will work once
implemented. Could you or Casey provide an example of a new field or
two, demonstrating how these dictionaries, fallbacks, and Null
functions work? I think this is likely a very simple thing, but I'm
having trouble visualizing it.

thanks,

j

On Wed, Nov 9, 2011 at 9:05 AM, Matthew Turk <matthewturk at gmail.com> wrote:
> Hi all,
>
> Over the last couple months, Casey and I have been working -- on and
> off! -- on a new branch of the code called "deliberate_fields."  This
> branch will change, in a substantial but easy-to-update way, how
> fields are handled in yt.
>
> I recognize this email is long.  But if you use non-standard fields, a
> bunch of derived fields, unit modifications, any of that, it may
> affect you.  So I *please* ask that you read it and, if you like,
> contribute back to the discussion.
>
> This is one of the items I really want to have done for a hypothetical
> 2.3 release.
>
> = Background =
>
> The way fields work currently was designed a bit haphazardly.  They
> use FieldInfoContainers, objects which share state and which contain
> unions of the known derived fields and the known IO-based fields.  One
> of the problems with this is that the only thing that separates a
> derived field from a known field is that function that generates the
> field: the IO-based fields all use a lambda which returns None, and
> the non-IO based fields return actual fields.  This is pretty
> sub-optimal, and it actually lands us in trouble when (for instance)
> we have fields wandering around named things like "Thermal_Energy" and
> "ThermalEnergy"; the mechanism by which one is selected and the other
> not is problematic, and to get around infinite recursion, hacks have
> had to be applied.
>
> As it stands, to find a field, the shared-state "field info" on a
> parameter file is queried; this then will try to check universal
> fields.  But because of how the fields are stored, the field info
> cascade can also operate in reverse.  The big problem is that the
> field selection mechanism doesn't seem to have a bus factor >= 1.0.
> And, it has a number of hacks to make it work with conflicting field
> definitiosn and the like.
>
> Unfortunately, layering these hacks on top of each other makes it much
> harder for other codes to be supported; translations are not reliable,
> and sometimes cause too many levels of recursion to be added.
> Something simpler is necessary.
>
> = What this does =
>
> Essentially, this creates multi-level, explicit fallbacks.  The field
> info container, which was a bloated, weird shared state object, is now
> simply a dictionary subclass with a "fallback" option.  When you
> create them, you can either create it in isolation (with no fallback)
> or with a fallback.  When you query it, if it does not have a field,
> it checks its fallback.  There are, additionally, two new functions
> for IO: the translation function and the null function.  The first is
> to translate, for instance, "density" to "Density" and the second is
> to indicate that a field is expected to be found in an output from the
> simulation code.
>
> There are now affiliated with each simulation code two field info
> objects: the "known" fields, which may appear in files, and the
> non-known (i.e., code-specific derived) fields.  These live as the
> attributes _fieldinfo_fallback and _fieldinfo_known on the
> StaticOutput sublcass corresponding to a simulation code.  When the
> Hierarchy (not static output) is instantiated, the first step is to
> create a new field_info object.  This has, as a fallback, the
> _fieldinfo_fallback, which itself has as a fallback the
> universally-known derived fields.  The hierarchy then queries the
> output file for which fields are available.  This process then looks
> for a corresponding field in fieldinfo_known, and if it finds it, it
> adds it to the field_info object, *overriding* any possible derived
> fields.  (In this manner, for instance, yt will not recalculate a
> "CoolingTime" field if one exists in the output.)
>
> = What it aims to do in the future =
>
> This will be utilized in three main ways:
>
> 1) Making it more clear which fields belong to which code, and which
> come from disk and which are derived
> 2) Help move IO into fields, to optimize for geometries and data containers
> 3) Make units more clear and specific
> 4) This is all designed around better supporting the GDF.
>
> = Where from here? =
>
> It would be hugely beneficial if you could test this and report back.
> I have created a pull request:
>
> https://bitbucket.org/yt_analysis/yt/pull-request/27/field-overhaul-to-utilize-explicit
>
> This is by no means a settled matter; I think we need to have testing
> on this, buy-in from developers and users, and to make sure that old
> code doesn't beak.  The test cases all pass for me for Enzo.
>
> Before this can be merged, I would hope we can get some testing from:
>
>  * Enzo
>  * Nyx
>  * FLASH
>  * Orion
>
> and any other codes that can hear me.
>
> Thanks very much for your time; please let me know if you have any
> questions, concerns, jokes, comments, improvements, CDs of your band,
> suggestions, and so on.  For this major of a change I'd like to keep
> discussion on list, so the record of this is a bit more prominent.
>
> Best,
>
> Matt
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>