[yt-dev] new ART front end (artio)

Sun Nov 18 14:54:51 PST 2012

Hi Doug,

Thanks for writing!  I'm sorry I didn't get to this sooner; I had a lot
going on this weekend.

On Friday, November 16, 2012, Douglas Harvey Rudd wrote:

>  Hi all,
>
> Sam Leitner and I have just started writing a new yt frontend for the
> distributed version of
> ART (a completely different file format, but very similar units and
> structure to the current
> ART frontend).  artio refers to an interface library we use to read/write
> from the new file formats,
> similar in principle to RamsesReader.
>

> We're targeting yt-3.0 in hopes of using the new Oct tree support written
> for Ramses, and
> hopefully can help develop and generalize that part of yt.  We'll be
> focusing on trying to get
> a very basic AMR support implemented and leave particle support for a
> later phase.
>

That sounds great!  I'm excited to hear about this project, and eager to
help out however I can.

>
> The online documentation on frontends is out of date and lacking in some
> areas, so we'll
> probably be flooding the list with questions over the next few weeks.  I
> am not a current
> user of yt, so I'm also trying to catch up on general terminology and may
> ask some basic
> or ill-posed questions.  Thanks for your patience.
>

The documentation of code frontends for 2.x should be reasonably up to
date, but there is not yet any documentation of code frontends for the 3.0
work, where things like object selection and so on have been completely
reworked.  I recognize this as a deficiency, particularly as it's the time
where other developers can and should start trying to use 3.0, so I will
endeavor as time allows to document things.  Please don't hesitate to ask
questions!

>
>  We currently are able to read parameters from artio and set a few fields
> and units.  This
> development has produced the following questions:
>
> - Is there a list of the properties of StaticOutput that are required?
>  Each front-end has a slightly
> different list and it's not clear which ones are in current use or
> required (except when the code
> throws an exception on load) .  For now we're setting the following:
>
> dimensionality
> refine_by
> domain_dimensions
> domain_left_edge
> domain_right_edge
> min_level
> max_level
> current_time
>
> and in the case of cosmological simulations:
> cosmological_simulation
> omega_lambda
> omega_matter
> hubble_constant
> current_redshift
>

Yup, I believe most are needed.  max_level I think is used some places,
min_level is probably not used much more at all (as in, allowing the
default to be set by the base class should be okay) and I think the others
are all enumerated in the wiki or the doc page about writing a new frontend.

>
> and
> parameters["initial_redshift']
> parameters["HydroMethod"]
>

I would like to get rid of these.  The first I have no idea why is required
(where did this error get thrown?) and the second is really a proxy for
things like "How do I calculate thermal energy?" or "Do these values need
to be center-differenced to get cell-centered quantities?"  I'd like to see
places where they appear outside of yt/frontends identified and removed.
 Both uses can be handled much more simply by derived fields.

>
>  what about units?  time_units?
>

Casey's going to be working on this, but for now, putting units into the
"units" dictionary would work fine.  I will be working with Casey to
re-work how things like length units, code units for fields, and so on are
accessed.

>
>
> - It looks like GeometryHandler has replaced AMRHierarchy as the preferred
> frontend interface?  Chombo
> now uses GridGeometryHandler rather than AMRHierarchy, for example.  How
> does that affect io.py, where
> the online documentation (
> http://yt-project.org/doc/advanced/creating_frontend.html) describes Grid
> instances
> being passed to methods in io.py.  What if we're using an
> OctreeGeometryHandler rather than a
> GridGeometryHandler?
>

Yes, GeometryHandler is now the replacement for the AMRHierarchy.  The
RAMSES frontend shows how this is done, but what it comes down to is that
IO is now handled by relatively flexible "chunks".

Chunks are designed to allow code specific frontends to define the best way
to handle batching of data IO.  If you look in ramses's io.py and Enzo's
io.py, you can see that the methods that need to be implemented are:

_read_particle_selection(self, chunks, selector, fields) (usually
delegates, as in the case of Enzo, to a type-specific reader)
_read_fluid_selection(self, chunks, selector, fields, size)

These are the basic methods that are called by the geometry handler's
functions _read_fluid_fields and _read_particle_fields.  The arguments look
like:

chunks: list of YTDataChunk objects
selector: SelectorObjects (you can see the definition in
yt/geometry/selection_routines.pxd) which can tell you which cells or
particles to include in an object
fields: list of fields of the form (ftype, fname) where ftype is a string
(for the case of multi-fluid simulations) and fname is the name of the
fluid type.
size: the total (expected) number of cells that intersect, so that fields
can be pre-allocated (to avoid memory fragmentation)

In the case of RAMSES, the chunks are RAMSESDomainSubsets, which contain
information about fields.  For Enzo, they'll be objects that contain lists
of grid objects as an attribute.

>
>  - What would be the best way to start developing a customized geometry
> handler?  Where are the major entry
> points, and what functions are required vs optional?  Is it possible to
> begin by writing something coarse that doesn't
> implement any performance features like chunking or parallelism?
>

I think the basic level would be a geometry handler that only implemented
an "all" chunking system, which I do believe would work as of right now.
 This would prevent parallelism at the moment, but there's also the
possibility of simply arbitrarily subdividing for the other chunking
systems -- which would also be a good first pass.  The only thing I suggest
is raising a NotImplementedError for spatial chunking as of right now, as
I'm still thinking about the best way to do this for Octree data, and once
you've taken a look it would be useful to have your feedback as well.

As near as I can tell, the *absolute* minimum routines you need to
implement are:

 * _count_selection
 * _identify_base_chunk
 * _chunk_*

You'll also probably have to implement the different chunks.

>
>
>  - We'd like to use the RAMSES and ART frontends as examples, since their
> data structures are very similar to our
> own.  How current are those frontends in yt-3.0?  Are there any major
> pieces which are scheduled for deprecation
> or refactoring that we should be aware of?  In RAMSES, for example, some
> field names are hard-coded in
> RAMSESGeometryHandler._detect_fields.  Shouldn't this information be
> pulled from the fields interface?
>

Ah, the field hardcoding is because as near as I can tell, RAMSES doesn't
have an enumerated list of fields anywhere on disk.  This is something I've
been meaning to upgrade to allow specification, as this will be important
for other irregular or non-self-describing formats like Gadget binary and
NMSU-ART.  The RAMSES reader I do hope to improve spatial data and
eventually -- but not in the first pass -- move to a generalized
distributed Octree.  That will require much more thought, but it is on the
docket for sometime after the first release of 3.0 goes out.

The "ART" frontend (which either needs a better name or needs to be
co-located with the new ART frontend, like how Enzo2.x and Enzo3.0 will be)
has not been upgraded at all.  I have some scripts that mock this up, but
Chris has been very busy lately and hasn't been able to synchronize with
me.  The RAMSES frontend to the best of my knowledge is functional; I have
tested it and it works, and I have also been working with Nick Moeckel to
make sure that it meets his needs -- I have received the occasional email
which has helped improve things like boundary conditions and so on.

>
>  Thanks for the help!
>

No, thank *you* for your work on this!  I'm sorry that the documentation
has not kept pace with development.  This is not because I don't recognize
the importance, but because I'm still working on it and have not prioritize
that.  The secondary reason really is that we don't have a good method for
evolving documentation of standards, with collaborative processes
(something like Confluence) and so I am disinclined to write things, since
they tedn to vanish in the ether.  I will attempt to push past this and log
more carefully what I work on and the design that goes into it.

Suggestions, feedback, and so on are all welcome.  In particular, because
3.0 is still alpha and under heavy development, feedback about how to
improve the design would be welcomed.  Last week I was able to change some
items to drop the time it took to read Enzo data by a factor of 4, just by
re-evaluating design decisions; this kind of change is still something I
want to explore.

Keep in touch,

Matt

>
>  Douglas Rudd
> Scientific Computing Consultant
> Research Computing Center, KICP
> University of Chicago
> drudd at uchicago.edu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20121118/f286c24f/attachment.htm>