[yt-dev] new ART front end (artio)

Sun Nov 18 15:28:16 PST 2012

Hi Matt,

Thanks for writing!  I'm sorry I didn't get to this sooner; I had a lot going on this weekend.

Of course, and you warned us.  Hope it's been a good busy.

The documentation of code frontends for 2.x should be reasonably up to date, but there is not yet any documentation of code frontends for the 3.0 work, where things like object selection and so on have been completely reworked.  I recognize this as a deficiency, particularly as it's the time where other developers can and should start trying to use 3.0, so I will endeavor as time allows to document things.  Please don't hesitate to ask questions!

Understandable, particularly when the interfaces may be in flux.  I didn't mean it as a criticism, but rather
a statement of fact.  As a first go, it might be useful to mark certain parts of the 2.x documentation as deprecated.

We currently are able to read parameters from artio and set a few fields and units.  This
development has produced the following questions:

- Is there a list of the properties of StaticOutput that are required?  Each front-end has a slightly
different list and it's not clear which ones are in current use or required (except when the code
throws an exception on load) .  For now we're setting the following:

dimensionality
refine_by
domain_dimensions
domain_left_edge
domain_right_edge
min_level
max_level
current_time

and in the case of cosmological simulations:
cosmological_simulation
omega_lambda
omega_matter
hubble_constant
current_redshift

Yup, I believe most are needed.  max_level I think is used some places, min_level is probably not used much more at all (as in, allowing the default to be set by the base class should be okay) and I think the others are all enumerated in the wiki or the doc page about writing a new frontend.

and
parameters["initial_redshift']
parameters["HydroMethod"]

I would like to get rid of these.  The first I have no idea why is required (where did this error get thrown?) and the second is really a proxy for things like "How do I calculate thermal energy?" or "Do these values need to be center-differenced to get cell-centered quantities?"  I'd like to see places where they appear outside of yt/frontends identified and removed.  Both uses can be handled much more simply by derived fields.

I think initial_redshift was my guess at the proper name, given a comment somewhere in the code that CosmologicalInitialRedshift or something
like it was deprecated.  I agree pushing as much into the front-ends is ideal, even if those front-ends simply call up the class hierarchy.

what about units?  time_units?

Casey's going to be working on this, but for now, putting units into the "units" dictionary would work fine.  I will be working with Casey to re-work how things like length units, code units for fields, and so on are accessed.

Okay.  If there's been any discussion on the list that I can use to get more up to date, let me know and I'll take
my catching-up offline.

- It looks like GeometryHandler has replaced AMRHierarchy as the preferred frontend interface?  Chombo
now uses GridGeometryHandler rather than AMRHierarchy, for example.  How does that affect io.py, where
the online documentation (http://yt-project.org/doc/advanced/creating_frontend.html) describes Grid instances
being passed to methods in io.py.  What if we're using an OctreeGeometryHandler rather than a
GridGeometryHandler?

Yes, GeometryHandler is now the replacement for the AMRHierarchy.  The RAMSES frontend shows how this is done, but what it comes down to is that IO is now handled by relatively flexible "chunks".

Chunks are designed to allow code specific frontends to define the best way to handle batching of data IO.  If you look in ramses's io.py and Enzo's io.py, you can see that the methods that need to be implemented are:

_read_particle_selection(self, chunks, selector, fields) (usually delegates, as in the case of Enzo, to a type-specific reader)
_read_fluid_selection(self, chunks, selector, fields, size)

These are the basic methods that are called by the geometry handler's functions _read_fluid_fields and _read_particle_fields.  The arguments look like:

chunks: list of YTDataChunk objects
selector: SelectorObjects (you can see the definition in yt/geometry/selection_routines.pxd) which can tell you which cells or particles to include in an object
fields: list of fields of the form (ftype, fname) where ftype is a string (for the case of multi-fluid simulations) and fname is the name of the fluid type.
size: the total (expected) number of cells that intersect, so that fields can be pre-allocated (to avoid memory fragmentation)

In the case of RAMSES, the chunks are RAMSESDomainSubsets, which contain information about fields.  For Enzo, they'll be objects that contain lists of grid objects as an attribute.

- What would be the best way to start developing a customized geometry handler?  Where are the major entry
points, and what functions are required vs optional?  Is it possible to begin by writing something coarse that doesn't
implement any performance features like chunking or parallelism?

I think the basic level would be a geometry handler that only implemented an "all" chunking system, which I do believe would work as of right now.  This would prevent parallelism at the moment, but there's also the possibility of simply arbitrarily subdividing for the other chunking systems -- which would also be a good first pass.  The only thing I suggest is raising a NotImplementedError for spatial chunking as of right now, as I'm still thinking about the best way to do this for Octree data, and once you've taken a look it would be useful to have your feedback as well.

Great, this is all very useful.  One issue is that spatial chunking is likely to be our preferred chunking method, given the on-disk
layout.  I'm hoping we can get something that functions as soon as possible, and use that to learn what what is working and what
needs refactoring for performance or flexibility.  Feel free to offload things that need to be implemented or need to be tested in
order to make that happen.  Sam seemed to think that the ramses reader was broken for one of the test examples, but he'll need
to be more specific and report to the list.

As near as I can tell, the *absolute* minimum routines you need to implement are:

 * _count_selection
 * _identify_base_chunk
 * _chunk_*

You'll also probably have to implement the different chunks.

- We'd like to use the RAMSES and ART frontends as examples, since their data structures are very similar to our
own.  How current are those frontends in yt-3.0?  Are there any major pieces which are scheduled for deprecation
or refactoring that we should be aware of?  In RAMSES, for example, some field names are hard-coded in
RAMSESGeometryHandler._detect_fields.  Shouldn't this information be pulled from the fields interface?

Ah, the field hardcoding is because as near as I can tell, RAMSES doesn't have an enumerated list of fields anywhere on disk.  This is something I've been meaning to upgrade to allow specification, as this will be important for other irregular or non-self-describing formats like Gadget binary and NMSU-ART.  The RAMSES reader I do hope to improve spatial data and eventually -- but not in the first pass -- move to a generalized distributed Octree.  That will require much more thought, but it is on the docket for sometime after the first release of 3.0 goes out.

Right, that's been a big flaw with NMSU-ART and early distributed ART formats.  Luckily this is corrected with artio.

The "ART" frontend (which either needs a better name or needs to be co-located with the new ART frontend, like how Enzo2.x and Enzo3.0 will be) has not been upgraded at all.  I have some scripts that mock this up, but Chris has been very busy lately and hasn't been able to synchronize with me.  The RAMSES frontend to the best of my knowledge is functional; I have tested it and it works, and I have also been working with Nick Moeckel to make sure that it meets his needs -- I have received the occasional email which has helped improve things like boundary conditions and so on.

I'm probably the only person on the planet who speaks all flavors of ART (that's certainly nothing to brag about).  If we're able
to get our version up and running, it should be possible to merge the two into one coherent interface that is able to auto-detect
code version, however, for now that's too daunting a task.

We're really hesitant to write a regridding version of artio to support 2.4, but it depends on how alpha 3.0 ends up being.  If it
turns out to be the better choice, we can drop back to stable and catch up later.

Doug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20121118/d1c9275e/attachment.htm>