[Yt-dev] Geometry, RAMSES and non-patch datasets

Tue Jun 21 21:00:44 PDT 2011

Hi Sam,

On Tue, Jun 21, 2011 at 8:07 AM, Sam Geen <samgeen at astro.ox.ac.uk> wrote:
> Hi,
>
> This is in reply to Matt's e-mail from 3 weeks ago (I only just realised I
> forgot to hit "confirm" on the yt-dev mailing list signup).

No worries!  :)

>
> I guess one solution to the problem would be to abstract what a "grid" is
> (I'm guessing a grid is a container for a geometrically consistent chunk of
> the entire simulation volume?) Then allow it to answer queries about its
> geometric properties itself. So for example, ask it
> "myGrid.IsInRegion(myWeirdGeometricConstruct)". I guess the trick is to
> figure out a flexible but simple interface for this, depending on how well
> you know the requirements for what the grid should be able to do. In
> general, I think this is the ideal situation, because as Matt says hammering
> every code into the same structure in memory creates slowdowns. One
> possibility is to create a few template memory structures, etc, to allow
> people to bolt together new implementations for each code.

A grid is indeed a container for a chunk of the simulation --
typically in patch-based AMR codes, these will be some (hopefully
large but not too large) contiguous region.  This enables numpy to
take over, as it helps batch mathematical operations -- for instance,
for an operation like:

field1*field2

the startup cost of parsing, identifying the object as a buffer of
contiguous data, identifying the types, dispatching the correct
function, and then allocating and returning a new buffer is the
startup cost against which the actual operation of multiplication is
weighed.  The batching of operations with grids nicely coincides with
reducing the ratio between startup to operation cost.

Right now, the mechanism for geometric constructs is inverted from how
you describe -- when describing a sphere, for instance, the operation
is:

 * Query the Hierarchy object (which I would support renaming to
'mesh' or 'geometry' in a future iteration of the code, likely 3.0) to
identify grids that intersect the geometric region.  This is
accomplished through a "geometry mixin" that supplies various routines
to do this.
 * Query each intersecting grid's x,y,z values (for each cell) to
identify which cells intersect the region.
 * Return these values to the routine/user that requested them.

I think that this is compatible with what you have outlined, in
general.  The issue I had hoped to avoid was to reduce the interaction
between IO and geometry as much as possible, simply because IO
routines are usually compiled code, whereas ideally I would like the
geometry to be performed in Python.  (As it stands it's usually done
with operations on bounding values of grids to find intersections.)

>
> In terms of choosing algorithms for different types of fluid blob (e.g. one
> for particles, one for grids), this can be done using functionoids for the
> algorithms (or at least functionoid wrappers) and then a functionoid factory
> for spawning the correct functionoid to use with the container. You'd have
> to wrap all this up in a simple interface again, otherwise it'd be
> impossible to use.
>
> I also suggested to Matt to create a "fluid blob" iterator that works for
> all types of fluid blob (SPH particle, octree grid cell, voronoi
> tessellation cell) but this might be very slow in Python. That said,
> iterating over "grid"s as chunks of the amr grid instead is a possibility.
> Having some kind of iterator option might be good, though, as doing things
> like tracking particles through different snapshots is something I've been
> doing extensively in my (pre-YT) work.

A generalized fluid blob iterator would be interesting; I think into
this the grids could be placed.  By extending the geometry mixin to
work with different methods, this could be feasible.  I wonder if
perhaps rethinking the idea of static geometries (determined at
instantiation) would assist with addressing SPH data.  I am inclined
to think this would be a way forward.  In looking over the code, it's
not clear to me that there are many places that grids are assumed
except in the projections and the first-pass of data selection.

(Projections as we do them now might port nicely to SPH, but it's not
yet clear to me.)

>
> I don't know how much of this is already known; my domain is Ramses, which
> is still very slow to use with my dataset (although Matthew has been very
> helpful in working on the Ramses side of things). I thus haven't looked too
> much at YT yet as it's still prohibitively slow to load my dataset and play
> with it.

I did manage to squeeze out what for me was an OOM improvement in
RAMSES data instantiation, but I confess it is still slow.  And there
are other issues with it.  Right now Casey and I are refactoring
fields, and I have set up a testing infrastructure, so I am feeling
bit more inclined to try more invasive changes after branching into
2.3 and 3.0 branches sometime later this summer.

Perhaps moving to a generalized geometry, into which the standard
patch/block AMR "hierarchy" paradigm would fit, would meet the
necessary needs to do generalized fluid operations...

-Matt

>
> Cheers,
>
> Sam
>
> On Tue, Jun 7, 2011 at 16:15 AM, Matthew Turk <matthewturk at gmail.com> wrote:
>
> Hi all,
>
> This is a portion of a conversation Sam Geen and I had off-list about
> where to make changes and how to insert abstractions to allow for
> generalized geometric reading of data; this would be useful for octree
> codes, particles codes, and non-rectilinear geometry.  We decided to
> "replay" the conversation on the mailing list to allow people to
> contribute their ideas and thoughts.  I spent a bit of time last night
> looking at the geometry usage in yt.
>
> Right now I see a few places this will need to be fixed:
>
>  * Data sources operate on the idea that grids act as a pre-selection
> for cells.  If we get the creation of grids -- without including any
> cell data inside them -- to be fast enough, this will not necessarily
> need to be changed.  (i.e., apply a 'regridding' step of empty grids.)
>  However, failing that, this will need to be abstracted into geometric
> selection.  For cylindrical coordinates this will need to be
> abstracted anyway.  The idea is that once you know which grids you
> want, you read them from disk, and then mask out the points that are
> not necessary.
>  * The IO is currently set up -- in parallel -- to read in chunks.
> Usually in parallel patch-based simulations, multiple grid patches are
> stored in a single file on disk.  So, these get chunked in IO to avoid
> too many fopen/seek/fclose operations (and the analogues in hdf5.)
> This will need to be rethought.  Obviously, there are still some
> analogues; however, it's not clear how -- without the actual
> re-gridding operation -- to keep the geometry selection and the IO
> separate.  I would prefer to try to do this as much as possible.  I
> think it's do-able, but I don't yet have a good strategy for it.
>
> My current feeling now is that the re-gridding may be a slightly
> necessary evil *at the moment*, but only for guiding the point
> selection.  It's currently been re-written to be based on hilbert
> curve locating, so each grid has a unique index in L-8 or something
> space.
>
> I believe that geometry and chunking of IO are the only issues at this
> time.  One possibility would actually be to move away from the idea of
> grids and instead of 'hilbert chunks'.  So these would be the items
> that would be selected, read from disk, and mapped.  This might fit
> nicer with the Ramses method.
>
> What do you think?
>
> Best,
>
> Matt
>
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>