[Yt-dev] Geometry, RAMSES and non-patch datasets

Wed Jun 22 06:11:01 PDT 2011

OK, cool. So is there anything that I can do to help with the Ramses 
instantiation? I could just play about with implemeting Pymses with the 
current I/O interface, but if you're already doing this then I'll wait 
to see what you come up with. Also, if the I/O interface is changing 
soon then I might hold off.

Actually, if there's any design documentation not in the code paper or 
not obviously on the website floating around then it might be useful to 
have a look at this. Also, another vaguely related query - how much can 
Python talk to compiled code (C++, etc)? To what extent can you pass 
Python objects to C++ routines, assuming you know what the public 
methods are?

Sam

Matthew Turk wrote:
> Hi Sam,
>
> On Tue, Jun 21, 2011 at 8:07 AM, Sam Geen <samgeen at astro.ox.ac.uk> wrote:
>   
>> Hi,
>>
>> This is in reply to Matt's e-mail from 3 weeks ago (I only just realised I
>> forgot to hit "confirm" on the yt-dev mailing list signup).
>>     
>
> No worries!  :)
>
>   
>> I guess one solution to the problem would be to abstract what a "grid" is
>> (I'm guessing a grid is a container for a geometrically consistent chunk of
>> the entire simulation volume?) Then allow it to answer queries about its
>> geometric properties itself. So for example, ask it
>> "myGrid.IsInRegion(myWeirdGeometricConstruct)". I guess the trick is to
>> figure out a flexible but simple interface for this, depending on how well
>> you know the requirements for what the grid should be able to do. In
>> general, I think this is the ideal situation, because as Matt says hammering
>> every code into the same structure in memory creates slowdowns. One
>> possibility is to create a few template memory structures, etc, to allow
>> people to bolt together new implementations for each code.
>>     
>
> A grid is indeed a container for a chunk of the simulation --
> typically in patch-based AMR codes, these will be some (hopefully
> large but not too large) contiguous region.  This enables numpy to
> take over, as it helps batch mathematical operations -- for instance,
> for an operation like:
>
> field1*field2
>
> the startup cost of parsing, identifying the object as a buffer of
> contiguous data, identifying the types, dispatching the correct
> function, and then allocating and returning a new buffer is the
> startup cost against which the actual operation of multiplication is
> weighed.  The batching of operations with grids nicely coincides with
> reducing the ratio between startup to operation cost.
>
> Right now, the mechanism for geometric constructs is inverted from how
> you describe -- when describing a sphere, for instance, the operation
> is:
>
>  * Query the Hierarchy object (which I would support renaming to
> 'mesh' or 'geometry' in a future iteration of the code, likely 3.0) to
> identify grids that intersect the geometric region.  This is
> accomplished through a "geometry mixin" that supplies various routines
> to do this.
>  * Query each intersecting grid's x,y,z values (for each cell) to
> identify which cells intersect the region.
>  * Return these values to the routine/user that requested them.
>
> I think that this is compatible with what you have outlined, in
> general.  The issue I had hoped to avoid was to reduce the interaction
> between IO and geometry as much as possible, simply because IO
> routines are usually compiled code, whereas ideally I would like the
> geometry to be performed in Python.  (As it stands it's usually done
> with operations on bounding values of grids to find intersections.)
>
>   
>> In terms of choosing algorithms for different types of fluid blob (e.g. one
>> for particles, one for grids), this can be done using functionoids for the
>> algorithms (or at least functionoid wrappers) and then a functionoid factory
>> for spawning the correct functionoid to use with the container. You'd have
>> to wrap all this up in a simple interface again, otherwise it'd be
>> impossible to use.
>>
>> I also suggested to Matt to create a "fluid blob" iterator that works for
>> all types of fluid blob (SPH particle, octree grid cell, voronoi
>> tessellation cell) but this might be very slow in Python. That said,
>> iterating over "grid"s as chunks of the amr grid instead is a possibility.
>> Having some kind of iterator option might be good, though, as doing things
>> like tracking particles through different snapshots is something I've been
>> doing extensively in my (pre-YT) work.
>>     
>
> A generalized fluid blob iterator would be interesting; I think into
> this the grids could be placed.  By extending the geometry mixin to
> work with different methods, this could be feasible.  I wonder if
> perhaps rethinking the idea of static geometries (determined at
> instantiation) would assist with addressing SPH data.  I am inclined
> to think this would be a way forward.  In looking over the code, it's
> not clear to me that there are many places that grids are assumed
> except in the projections and the first-pass of data selection.
>
> (Projections as we do them now might port nicely to SPH, but it's not
> yet clear to me.)
>
>   
>> I don't know how much of this is already known; my domain is Ramses, which
>> is still very slow to use with my dataset (although Matthew has been very
>> helpful in working on the Ramses side of things). I thus haven't looked too
>> much at YT yet as it's still prohibitively slow to load my dataset and play
>> with it.
>>     
>
> I did manage to squeeze out what for me was an OOM improvement in
> RAMSES data instantiation, but I confess it is still slow.  And there
> are other issues with it.  Right now Casey and I are refactoring
> fields, and I have set up a testing infrastructure, so I am feeling
> bit more inclined to try more invasive changes after branching into
> 2.3 and 3.0 branches sometime later this summer.
>
> Perhaps moving to a generalized geometry, into which the standard
> patch/block AMR "hierarchy" paradigm would fit, would meet the
> necessary needs to do generalized fluid operations...
>
> -Matt
>
>   
>> Cheers,
>>
>> Sam
>>
>> On Tue, Jun 7, 2011 at 16:15 AM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>> Hi all,
>>
>> This is a portion of a conversation Sam Geen and I had off-list about
>> where to make changes and how to insert abstractions to allow for
>> generalized geometric reading of data; this would be useful for octree
>> codes, particles codes, and non-rectilinear geometry.  We decided to
>> "replay" the conversation on the mailing list to allow people to
>> contribute their ideas and thoughts.  I spent a bit of time last night
>> looking at the geometry usage in yt.
>>
>> Right now I see a few places this will need to be fixed:
>>
>>  * Data sources operate on the idea that grids act as a pre-selection
>> for cells.  If we get the creation of grids -- without including any
>> cell data inside them -- to be fast enough, this will not necessarily
>> need to be changed.  (i.e., apply a 'regridding' step of empty grids.)
>>  However, failing that, this will need to be abstracted into geometric
>> selection.  For cylindrical coordinates this will need to be
>> abstracted anyway.  The idea is that once you know which grids you
>> want, you read them from disk, and then mask out the points that are
>> not necessary.
>>  * The IO is currently set up -- in parallel -- to read in chunks.
>> Usually in parallel patch-based simulations, multiple grid patches are
>> stored in a single file on disk.  So, these get chunked in IO to avoid
>> too many fopen/seek/fclose operations (and the analogues in hdf5.)
>> This will need to be rethought.  Obviously, there are still some
>> analogues; however, it's not clear how -- without the actual
>> re-gridding operation -- to keep the geometry selection and the IO
>> separate.  I would prefer to try to do this as much as possible.  I
>> think it's do-able, but I don't yet have a good strategy for it.
>>
>> My current feeling now is that the re-gridding may be a slightly
>> necessary evil *at the moment*, but only for guiding the point
>> selection.  It's currently been re-written to be based on hilbert
>> curve locating, so each grid has a unique index in L-8 or something
>> space.
>>
>> I believe that geometry and chunking of IO are the only issues at this
>> time.  One possibility would actually be to move away from the idea of
>> grids and instead of 'hilbert chunks'.  So these would be the items
>> that would be selected, read from disk, and mapped.  This might fit
>> nicer with the Ramses method.
>>
>> What do you think?
>>
>> Best,
>>
>> Matt
>>
>> _______________________________________________
>> Yt-dev mailing list
>> Yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>>
>>     
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>