[yt-dev] Interacting with data in yt 3.0 (was Field units from code to code)

Matthew Turk matthewturk at gmail.com
Mon Apr 2 10:47:26 PDT 2012


Hi Casey,

On Mon, Apr 2, 2012 at 1:01 PM, Casey W. Stark <caseywstark at gmail.com> wrote:
> I think I forgot to reply -- Tuesday works for me and Wednesday is good
> before 11 or after 12:30 Pacific.
>
> We can sort this out during the hangout, but which issue are we focusing on?
> Is this more for the units system, renaming fields in the 3.0 branch, or the
> dataset change? (or maybe something else that was mentioned, there were a
> lot)

How about 1:00PM pacific on Wednesday?  And I was thinking we'd work
in yt-refactor and change up the fields.

-Matt

>
> Best,
> Casey
>
>
> On Fri, Mar 30, 2012 at 12:59 PM, Matthew Turk <matthewturk at gmail.com>
> wrote:
>>
>> On Fri, Mar 30, 2012 at 1:22 PM, Nathan Goldbaum <goldbaum at ucolick.org>
>> wrote:
>> >> 1) Get rid of accessing parameters with an implicit __getitem__ on the
>> >> parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]).  I'm
>> >> +10 on this.
>> >> 2) Move units into the .units object (I'm mostly with Casey on this,
>> >> but I think it should be a part of the field_info object)
>> >> 3) Have things like current_time, domain_dimensions and so on move
>> >> into basic_info and make them dict objects.
>> >>
>> >> I think of those, I'm in favor of one and two, but somewhat opposed to
>> >> #3.  Right now we have these attributes mandated for subclasses of
>> >> StaticOutput:
>> >
>> > I'd say #3 is the least important.  I'd be fine with the dataset object
>> > having some non-dict attributes that describe the nature of the dataset
>> > rather than storing them all in a basic_info dict.  One thing to think
>> > about: if we want to support pure-particle datasets, then we should drop the
>> > notion of  refine_by as a basic dataset attribute.
>>
>> I think whether refine_by sticks around depends on how we end up
>> wanting to address fluid quantities in particle datasets.  One
>> possibility for handling SPH data is to grid it, and while I don't
>> want to lock us into that (myopic at best) I don't want to exclude it
>> as an ultimate possibility.  But yes, in general, I agree.  As I have
>> been working on the geometry refactor, the number of times refine_by
>> is access has been going down, as for the most part it relies on (for
>> instance) the projection code knowing how to handle data from grids,
>> which has been pshed back onto the grids instead.  Now projections
>> simply receive data that is ordered spatially, and that data is
>> appropriately added.
>>
>> >
>> >> With the geometry_refactor, I'd like to consolidate functionality into
>> >> the main "dataset" object.  The geometry can still provide access to
>> >> the individual grids (of course) but data objects, finding max,
>> >> getting stats about the simulation, etc, should all go into the main
>> >> dataset object, and the geometry handler can simply be created on the
>> >> fly if necessary.
>> >
>> > Why not get access to objects through a geometry attribute that hangs
>> > off of the dataset object.  If I wanted to instantiate a sphere object, I
>> > would just do:
>> >
>> > sp = ds.geometry.sphere()
>> >
>> > This is pretty much the same as the pf.h.sphere() syntax in place right
>> > now but allows for arbitrary selection embedded inside of the new geometry
>> > code.
>>
>> That's how I was implementing it.  I just wasn't sure this was as
>> clean.  Having the plots then hang off the geometry feels a little
>> funny.
>>
>> Also, I don't think I explicitly commented on Casey's hangout
>> suggestion -- I am in favor.  Could we do Tuesday afternoon (late
>> morning CA time) or Wednesday same?
>>
>> -Matt
>>
>> >
>> > Nathan Goldbaum
>> > Graduate Student
>> > Astronomy & Astrophysics, UCSC
>> > goldbaum at ucolick.org
>> > http://www.ucolick.org/~goldbaum
>> >
>> > On Mar 30, 2012, at 3:48 AM, Matthew Turk wrote:
>> >
>> >> In general, I agree with the idea Nathan put out.  (Also, I think this
>> >> is a fine time to have a bikeshed discussion.  Many of the underlying
>> >> assumptions about how yt works were laid out a long time ago.)  But,
>> >> I'm not entirely sure I understand how different it would be --
>> >> conceptually, yes, I see what you're getting at, that we'd have a set
>> >> number of attributes.  In what I was thinking of for the geometry
>> >> refactor so far I'm trying to get rid of the "hierarchy" as existing
>> >> for every data set, and instead relying on what amounts to an
>> >> object-finder and io-coordinator, which I'm calling a geometry
>> >> handler.  It sounds like what you would like is:
>> >>
>> >> 1) Get rid of accessing parameters with an implicit __getitem__ on the
>> >> parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]).  I'm
>> >> +10 on this.
>> >> 2) Move units into the .units object (I'm mostly with Casey on this,
>> >> but I think it should be a part of the field_info object)
>> >> 3) Have things like current_time, domain_dimensions and so on move
>> >> into basic_info and make them dict objects.
>> >>
>> >> I think of those, I'm in favor of one and two, but somewhat opposed to
>> >> #3.  Right now we have these attributes mandated for subclasses of
>> >> StaticOutput:
>> >>
>> >> refine_by
>> >> dimensionality
>> >> current_time
>> >> domain_dimensions
>> >> domain_left_edge
>> >> domain_right_edge
>> >> unique_identifier
>> >> current_redshift
>> >> cosmological_simulation
>> >> omega_matter
>> >> omega_lambda
>> >> hubble_constant
>> >>
>> >> The only ones here that I think would be okay to move out of
>> >> properties would be the cosmology items, and even those I'm -0 on
>> >> moving.
>> >>
>> >> But, in general, the idea of moving from this two-stage system of
>> >> parameter file (rather than dataset) and hierarchy (rather than an
>> >> implicitly-handled geometry) is something I am in support of.  The
>> >> geometry is something that should nearly *always* be handled by the
>> >> backend, rather than by the user.  So having the library require
>> >> pf.h.sphere(...) is less than ideal, since it's exposing something
>> >> relatively unfortunate (that building a hundred thousand grid objects
>> >> can take some time).
>> >>
>> >> The main ways that the static output is interacted with:
>> >>
>> >> * Parameter information specific to a simulation code
>> >> * Properties that yt needs to know about
>> >> * To get at the hierarchy
>> >> * Input to plot collections
>> >>
>> >> The main ways that the hierarchy is interacted with:
>> >>
>> >> * Getting data objects
>> >> * Finding max
>> >> * Statistics about the simulation
>> >> * Inspecting individual grids (much less common use case now that it
>> >> was before)
>> >>
>> >> All of these use cases are still valid, but I think it's clear that
>> >> accessing individual grids and accessing simulation-specific
>> >> parameters are not "generic" functions.  What a lot of this discussion
>> >> has really brought up for me is that we're talking about *generic*
>> >> functionality, not code-specific functionality, and we right now do
>> >> not have the best enumeration of functionality and where it lies.
>> >>
>> >> With the geometry_refactor, I'd like to consolidate functionality into
>> >> the main "dataset" object.  The geometry can still provide access to
>> >> the individual grids (of course) but data objects, finding max,
>> >> getting stats about the simulation, etc, should all go into the main
>> >> dataset object, and the geometry handler can simply be created on the
>> >> fly if necessary.
>> >>
>> >> This brings up two points, though --
>> >>
>> >> 1) Does our method of instantiating objects still hold up?  i.e.,
>> >> ds.sphere(...) and so on?  Or does our dataset object then become
>> >> overcrowded?  I would also like to move *all* plotting objects into
>> >> whatever we end up deciding is the location data containers come from,
>> >> which for instance could look like ds.plot("slice", "x") (for
>> >> instance, although we can bikeshed that later), which would return a
>> >> plot window.
>> >> 2) Datasets and time series should behave, if not identically, at
>> >> least consistently in their APIs.  Moving to a completely ds-mediated
>> >> mechanism for generating, accessing and inspecting data opens up the
>> >> ability to then construct very nice and simply proxy objects.  As an
>> >> example, while something this is currently technically possible with
>> >> the current Time Series API, it's a bit tricky:
>> >>
>> >> ts = TimeSeriesData.from_filenames(...)
>> >> plot = ts.plot("slice", "x", (100.0, 'au'))
>> >> ts.seek(dt = (100, 'years'))
>> >> plot.save()
>> >> ts.seek(dt = (10, 'years'))
>> >> plot.save()
>> >>
>> >> (The time-slider, as Tom likes to call it ...)
>> >>
>> >> In general, this idea of moving toward more thoughtful
>> >> dataset-construction, rather than the hokey parameter file + hierarchy
>> >> construction brings with it a mindset shift which I'd like to spread
>> >> to the time series, which can continue to be a focus.
>> >>
>> >> What do you think?
>> >>
>> >> -Matt
>> >>
>> >> On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark <caseywstark at gmail.com>
>> >> wrote:
>> >>> +1 on datasets, although I would like to see the unit object(s) at the
>> >>> field
>> >>> level.
>> >>>
>> >>>
>> >>> On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels
>> >>> <chummels at astro.columbia.edu> wrote:
>> >>>>
>> >>>> +1 on datasets.
>> >>>>
>> >>>>
>> >>>> On 3/29/12 6:58 PM, Nathan Goldbaum wrote:
>> >>>>>
>> >>>>> +1.  I'd also be up to help out with the sprint.  Doing a virtual
>> >>>>> sprint
>> >>>>> using a google hangout might help mitigate some of the distance
>> >>>>> problems.
>> >>>>>
>> >>>>> While we're brining up Enzo-isms that we should get rid of, I think
>> >>>>> it
>> >>>>> might be a good idea to make a conceptual shift in the basic python
>> >>>>> UI.
>> >>>>>  Instead referring to the interface between the user and the data as
>> >>>>> a
>> >>>>> parameter file, I think instead we should be talking about datasets.
>> >>>>>  One
>> >>>>> would instantiate a dataset just like we do now with parameter
>> >>>>> files:
>> >>>>>
>> >>>>> ds = load(filename)
>> >>>>>
>> >>>>> A dataset would also have some universal attributes which would
>> >>>>> present
>> >>>>> themselves to the user as a dict, e.g. ds.units, ds.parameters,
>> >>>>> ds.basic_info (like current_time, timestep, filename, and simulation
>> >>>>> code),
>> >>>>> and ds.hierarchy (not sure how that would interfere with the
>> >>>>> geometry
>> >>>>> refactor).
>> >>>>>
>> >>>>> This may be a paintibg the bike shed discussion, but I think this
>> >>>>> shift
>> >>>>> will help new users understand how to access their data.  Thoughts?
>> >>>>>
>> >>>>> On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk at gmail.com>
>> >>>>>  wrote:
>> >>>>>
>> >>>>>> Hi Nathan and Casey,
>> >>>>>>
>> >>>>>> I agree with what both of you have said.  The Orion/Nyx units
>> >>>>>> should
>> >>>>>> be made to be consistent, but more importantly I think we should
>> >>>>>> continue breaking away from Enzo-isms in the code.
>> >>>>>>
>> >>>>>> As it stands, all of the universal fields call underlying
>> >>>>>> Enzo-named
>> >>>>>> aliases -- Density, ThermalEnergy, etc etc.  I hope we can have a
>> >>>>>> 3.0
>> >>>>>> out within a calendar year, hopefully by the end of this year.
>> >>>>>>  (I've
>> >>>>>> been pushing on the geometry refactor, although recently other
>> >>>>>> efforts
>> >>>>>> have been paying off which has decreased my output there.)  I am
>> >>>>>> much,
>> >>>>>> much less doubtful than Casey is that we cannot do this; in fact,
>> >>>>>> I'm
>> >>>>>> completely in favor of this and I think it would be relatively
>> >>>>>> straightforward to implement.
>> >>>>>>
>> >>>>>> In the existing system we have a mechanism for aliasing fields.
>> >>>>>>  What
>> >>>>>> we can do is provide an additional translation system where we
>> >>>>>> enumerate the fields that are available for items in
>> >>>>>> UniversalFields,
>> >>>>>> and then construct aliases to those.  This would mean changing what
>> >>>>>> is
>> >>>>>> aliased in existing non-Enzo frontends, and adding aliases in Enzo.
>> >>>>>> The style of name Casey proposes is what I woudl also agree with:
>> >>>>>> underscores, lower cases, and erring on the side of verbosity.  The
>> >>>>>> fields off hand that we would need to do this for (in their current
>> >>>>>> enzo-isms):
>> >>>>>>
>> >>>>>> x-velocity =>  velocity_x (same for y, z)
>> >>>>>> Density =>  density
>> >>>>>> TotalEnergy =>  ?
>> >>>>>> GasEnergy =>  thermal_energy_specific (and thermal_energy_density)
>> >>>>>> Temperature =>  temperature
>> >>>>>>
>> >>>>>> and so on.
>> >>>>>>
>> >>>>>> Once we have these aliases in place, an overall cleanup of
>> >>>>>> UniversalFields should take place.  One place we should clean up is
>> >>>>>> ensuring that there are no conditionals; rather than conditionals
>> >>>>>> inside the functions, we should place those conditionals inside the
>> >>>>>> parameter file types.  So for instance, if you have a field that is
>> >>>>>> calculated differently depending on the parameter HydroMethod (in
>> >>>>>> Enzo
>> >>>>>> for instance) you simply set a validator on the field requiring the
>> >>>>>> parameter be set to a particular value, and then only the field
>> >>>>>> which
>> >>>>>> satisfies that validator will be called when requested.
>> >>>>>>
>> >>>>>> So we've gotten rid of a bunch of enzo-isms in the parameter files;
>> >>>>>> after fields, what else can we address?  And, I'd be up for
>> >>>>>> sprinting
>> >>>>>> on this (which should take just a few hours) basically any time
>> >>>>>> next
>> >>>>>> week or after.  I'd also be up for talking more about geometry
>> >>>>>> refactoring, if anyone is interested, but it's not quite to the
>> >>>>>> point
>> >>>>>> that I think I am satisfied enough with the architecture to request
>> >>>>>> input / contributions.  Sometimes (especially with big
>> >>>>>> architectural
>> >>>>>> things like this) I think it's a shame we do all of our work
>> >>>>>> virtually, as I think a lot of this would be easier to bang out in
>> >>>>>> person for a couple hours.
>> >>>>>>
>> >>>>>> -Matt
>> >>>>>>
>> >>>>>> On Wed, Mar 28, 2012 at 6:14 PM, Casey W.
>> >>>>>> Stark<caseywstark at gmail.com>
>> >>>>>>  wrote:
>> >>>>>>>
>> >>>>>>> Hi Nathan.
>> >>>>>>>
>> >>>>>>> I'm also worried about this and I agree that fields with the same
>> >>>>>>> name
>> >>>>>>> should all be consistent. I would support some sort of cleanup of
>> >>>>>>> frontend
>> >>>>>>> fields, and I can get the Nyx fields in line and help with Enzo.
>> >>>>>>>
>> >>>>>>> I doubt we can do this, but I would prefer changing the field
>> >>>>>>> names as
>> >>>>>>> part
>> >>>>>>> of the removing enzo-isms and geometry handling refactoring
>> >>>>>>> pushes. For
>> >>>>>>> instance, the field in Orion could be thermal_energy_density and
>> >>>>>>> the
>> >>>>>>> field
>> >>>>>>> in Enzo could be specific_thermal_energy. I also noticed this
>> >>>>>>> issue
>> >>>>>>> when I
>> >>>>>>> was using "Density" in Enzo (proper density in cgs) and "density"
>> >>>>>>> in
>> >>>>>>> Nyx
>> >>>>>>> (comoving density in cgs).
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Casey
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wed, Mar 28, 2012 at 1:47 PM, Nathan
>> >>>>>>> Goldbaum<goldbaum at ucolick.org>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi all,
>> >>>>>>>>
>> >>>>>>>> On IRC today we noticed that Orion defines its ThermalEnergy
>> >>>>>>>> field per
>> >>>>>>>> unit volume but Enzo and FLASH define ThermalEnergy per unit
>> >>>>>>>> mass.  Is
>> >>>>>>>> this
>> >>>>>>>> a problem?  Since yt defaults to the Enzo field names, should we
>> >>>>>>>> try
>> >>>>>>>> to make
>> >>>>>>>> sure that all fields are defined using the same units as in Enzo?
>> >>>>>>>>  Is
>> >>>>>>>> there
>> >>>>>>>> a convention for how different codes should define derived fields
>> >>>>>>>> that
>> >>>>>>>> are
>> >>>>>>>> aliased to Enzo fields?
>> >>>>>>>>
>> >>>>>>>> One problem for this particular example is that the Pressure
>> >>>>>>>> field is
>> >>>>>>>> defined in terms of ThermalEnergy in universal_fields.py so the
>> >>>>>>>> units
>> >>>>>>>> of
>> >>>>>>>> ThermalEnergy become important if a user merely wants the gas
>> >>>>>>>> pressure
>> >>>>>>>> in
>> >>>>>>>> the simulation.
>> >>>>>>>>
>> >>>>>>>> One possible solution for this issue would be the units overhaul
>> >>>>>>>> we're
>> >>>>>>>> planning. If all fields are associated with a unit object, we can
>> >>>>>>>> simply
>> >>>>>>>> query the units to ensure that units are taken care of correctly
>> >>>>>>>> and
>> >>>>>>>> code-to-code comparisons aren't sensitive to the units chosen for
>> >>>>>>>> fields in
>> >>>>>>>> the frontend.
>> >>>>>>>>
>> >>>>>>>> Personally, I think it would be best if we could make sure that
>> >>>>>>>> all of
>> >>>>>>>> the
>> >>>>>>>> fields aliased to Enzo fields have the same units.
>> >>>>>>>>
>> >>>>>>>> Nathan Goldbaum
>> >>>>>>>> Graduate Student
>> >>>>>>>> Astronomy&  Astrophysics, UCSC
>> >>>>>>>>
>> >>>>>>>> goldbaum at ucolick.org
>> >>>>>>>> http://www.ucolick.org/~goldbaum
>> >>>>>>>>
>> >>>>>>>> _______________________________________________
>> >>>>>>>> yt-dev mailing list
>> >>>>>>>> yt-dev at lists.spacepope.org
>> >>>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> _______________________________________________
>> >>>>>>> yt-dev mailing list
>> >>>>>>> yt-dev at lists.spacepope.org
>> >>>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> yt-dev mailing list
>> >>>>>> yt-dev at lists.spacepope.org
>> >>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>> _______________________________________________
>> >>>>> yt-dev mailing list
>> >>>>> yt-dev at lists.spacepope.org
>> >>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>>>>
>> >>>> _______________________________________________
>> >>>> yt-dev mailing list
>> >>>> yt-dev at lists.spacepope.org
>> >>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> yt-dev mailing list
>> >>> yt-dev at lists.spacepope.org
>> >>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>>
>> >> _______________________________________________
>> >> yt-dev mailing list
>> >> yt-dev at lists.spacepope.org
>> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>
>> >> !DSPAM:10175,4f758f9f246202301928688!
>> >>
>> >
>> > _______________________________________________
>> > yt-dev mailing list
>> > yt-dev at lists.spacepope.org
>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>



More information about the yt-dev mailing list