[yt-dev] RFC: SPH plan in yt 3.0

Tue Jan 8 13:45:43 PST 2013

Hi Matt,

Can you elaborate on why it will be so hard to implement the smoothing 
kernel?

 From my point of view its straightforward to smooth the particles onto 
grids, or, if we avoid gridding, splat the particles onto pixels 
according to the smoothing kernel.  While there are a number of possible 
choices for kernel, most codes (Gadget in particular) use various 
flavors of the cubic spline kernel, which has only one free parameter, 
the smoothing length, which will be written to disk along with the 
particle data.

As for whether to grid the data or not, I think that it will be 
necessary for geometric selection of data.  One could in principle set 
up profiles over the whole simulations using the capabilities that 
you've already completed, querying the fluid fields at the particle 
locations and calculating derived fields based on the fluid fields 
written to disk at each particle location.  However, it seems to me that 
calculating fields that require some sort of differencing, or 
calculating profiles over a geometric object /requires/ a gridding step.

I think it should be possible to do most things without the expensive 
gridding, following your second suggestion for visualization based on 
the SPLASH algorithms, but I also think we should provide the capability 
to export to GDF or internally convert to gridded data in memory since 
it enables a lot of sophisticated analysis tasks that are more or less 
straightforward on grid data.

That being said, I'm no SPH expert, so please correct me if I'm speaking 
out of turn here.

Cheers,

Nathan

On 1/8/13 5:18 PM, Matthew Turk wrote:
> Hi all,
>
> I am writing today to request comments and suggestions for supporting
> SPH data in yt.  This is part of a broader effort -- which includes
> the native Octree support -- in yt 3.0 to correctly support
> non-gridpatch data.
>
> This email contains a lot of information about where we are, and I was
> hoping to get some feedback from people who have more experience with
> SPH data, Gadget, AREPO, etc.  In particular, (if they're around and
> have the time) it'd be great to hear from Marcel Haas, Michael J.
> Roberts, and Chris Moody.
>
> Additionally, I am very interested in approaching this as an
> iterative, incremental process.  Minimum Viable Product (MVP) first,
> then improving as we learn and go along.
>
> = Background =
>
> As seen in this pull request:
>
> https://bitbucket.org/yt_analysis/yt-3.0/pull-request/11/initial-n-body-data-support
>
> I've implemented a first pass at N-body data support.  This proceeds
> through the following steps:
>
>    * Read header
>    * Create octree
>    * For each file in output:
>      * Add particles to octree
>      * When a cell crests a certain number of particles (or when
> particles from more than one file are co-located) refine
>    * Directly query particles to get data.
>
> As it stands, for N-body data this is very poor at memory
> conservation.  In fact, it's extremely poor.  I have however written
> it so that in the future, we can move to a distributed memory octree,
> which should alleviate difficulties.
>
> For quantitative analysis of quantities that are *in the output*, this
> already works.  It does not yet provide any kind of kernel or density
> estimator.  By mandating refinement based on file that the particles
> are in, load on demand operations can (more) efficiently read data.
>
> Many of the related items are outlined in the corresponding YTEP:
>
> https://yt.readthedocs.org/projects/ytep/en/latest/YTEPs/YTEP-0005.html
>
> Currently, it works for OWLS data.  However, since there has been
> something of a Gadget fragmentation, there is considerable difficulty
> in "supporting Gadget" when that includes supporting the binary
> format.  I've instead taken the approach of attempting to isolate
> nearly all the functionality into classes so that, at worst, someone
> could implement a couple functions, supply them to the Gadget
> interface, and it will use those to read data from disk.  Some
> integration work on this is still to come, but on some level it will
> be a 'build your own' system for some particle codes.  OWLS data
> (which was run with Gadget) on the other hand is in HDF5, is
> self-describing, has a considerable number of metadata items
> affiliated with it, and even outputs the Density estimate for the
> hydro particles.
>
> Where I'm now stuck is how to proceed with handling SPH data.  By
> extension, this also applies somewhat to handling tesselation codes
> like AREPO and PHURBAS.
>
> = Challenges =
>
> Analyzing quantitative data, where fields can be generated from
> fully-local information, is not difficult.  We can in fact largely
> consider this to be done.  This includes anything that does not
> require applying a smoothing kernel.
>
> The hard part comes in in two areas:
>
>    * Processing SPH particles as fluid quantities
>    * Visualizing results
>
> Here are a few of our tools:
>
>    * Data selection and reading: we can get rectangular prisms and read
> them from disk relatively quickly.  Same for spheres.  We can also get
> boolean regions of these and read them from disk.
>    * Neighbor search in octrees.  Given Oct A at Level N, we can get
> all M (<=26) Octs that neighbor A at Levels <= N.
>    * Voronoi tesselations: we have a wrapper for Voro++ which is
> relatively performant, but not amazing.  The DTFE method provides a
> way to use this to our advantage, although this requires non-local
> information.  (The naive approach, of getting density by dividing the
> particle mass by the volume of the voronoi cell, does not give good
> agreement at least for the OWLS data.)
>
> Here's what we want to support:
>
>    * Quantitative analysis (profiles, etc)
>    * Visualization (Projections, slices, probably not VR yet)
>    * Speed
>
> == Gridding Data ==
>
> In the past, one approach we have explored has been to simply grid all
> of the values.  Assuming you have a well-behaved kernel and a large
> number of particles, you can calculate at every cell-center the value
> of the necessary fields.  If this were done, we could reduce the
> complexity of the problem *after* gridding the data, but it could
> potentially increase the complexity of the problem *before* gridding
> the data.
>
> For instance, presupposing we took our Octree and created blocks of
> 8^3 (like FLASH), we could then analyze the resultant eulerian grid
> identically to other codes -- projections, slices, and so on would be
> done exactly as elsewhere.
>
> This does bring with it the challenge of conducting the smoothing
> kernel.  Not only are there a handful of free parameters in the
> smoothing kernel, but it also requires handling edge effects.
> Individual octs are confined to a single domain; neighbors, however,
> are not.  So we'd potentially be in a situation where to calculate the
> values inside a single Oct, we would have to read from many different
> files.  From the standpoint of implementing in yt, though, we could
> very simply implement this using the new IO functions in 3.0.  We
> already have a _read_particles function, so the _read_fluid function
> would identify the necessary region, read the particles, and then
> proceed to smooth them.  We would need to ensure that we were not
> subject to edge effects (something Daniel Price goes into some detail
> about in the SPLASH paper) which may prove tricky.
>
> It's been impressed upon me numerous times, however, that this is a
> sub-optimal solution.  So perhaps we should avoid it.
>
> == Not Gridding Data ==
>
> I spent some time thinking about this over the last little while, and
> it's no longer obvious to me that we need to grid the data at all even
> to live within the yt infrastructure.  What gridded data buys us is a
> clear "extent" of fluid influence (i.e., a particle's influence may
> extend beyond its center) and an easy way of transforming a fluid into
> an image.  The former is hard.  But I think the latter is easier than
> I realized before.
>
> Recent, still percolating changes in how yt handles pixelization and
> ray traversal should allow a simple mechanism for swapping out
> pixelization functions based on characteristics of the dataset.  A
> pixelization function takes a slice or projection data object (which
> is, in fact, an *actual* data object and not an image) and then
> converts them to an image.  For SPH datasets, we could mandate that
> the pixelization object be swapped out for something that can apply an
> SPH-appropriate method of pixelization (see Daniel Price's SPLASH
> paper, or the GigaPan paper, for a few examples).  We would likely
> also want to change out the Projection object, as well, and we may
> need to eliminate the ability to avoid IO during the pixelization
> procedure (i.e., the project-once-pixelize-many advantage of grid
> data).  But I think it would be do-able.
>
> So we could do this.  I think this is the harder road, but I think it
> is also possible.  We would *still* need to come up with a mechanism
> for applying the smoothing kernel.
>
> = Conclusions and Contributions =
>
> So I think we have two main paths forward -- gridding, and not
> gridding.  I would very, very much appreciate comments or feedback on
> either of these approaches.  I believe that, regardless of what we
> choose to do, we'll need to figure out how to best apply a smoothing
> kernel to data; I think this is probably going to be difficult.  But I
> am optimistic we are nearing the point of usability.  For pure N-body
> data, we are already there -- the trick is now to look at the data as
> fluids, not just particles.
>
> The best way to contribute help right now is to brainstorm and give
> feedback.  I'd love to hear ideas, criticisms, flaws in my logic, ways
> to improve things, and so on and so forth.  I think once we have an
> MVP, we will be better set up to move forward.
>
> If you'd like to add a reader for your own flavor of particle data, we
> should do that as well -- especially if you are keen to start
> exploring and experimenting.  The code so far can be found in the
> repository linked above.  I'd also appreciate comments (or acceptance)
> of the pull request.  I am also happy to hold a Hangout to discuss
> this, go over things, help set up readers or whatever.  I consider
> this a relatively high-priority project.  It will be a key feature of
> yt 3.0.
>
> (Speaking of which, today I've allotted time to focus on getting the
> last major patch-refinement feature done -- covering grids.  Hopefully
> patch-refinement people can test it out after that's done!)
>
> Thanks for your patience with such a long email!  :)
>
> -Matt
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org