[yt-dev] RFC: SPH plan in yt 3.0

Tue Jan 15 13:17:10 PST 2013

Hi Marcel et al,

Sorry for the delay, I was trying to wrap my head around this and also
push on some acceptance tests, etc.  I've added a first pass at a
gadget binary format reader, although right now it *only* reads
vanilla gadget files.  As I noted before, providing an easy way of
specifying a non-standard loader for this is high-priority.

On Thu, Jan 10, 2013 at 1:38 PM, Marcel Haas <mhaas at physics.rutgers.edu> wrote:
> Hi all,
>
>
> Thanks, Matt, for the awesome set-up! Apologies for a delay, but here are my two cents. I am experienced with SPH, but far less with yt than any of you, so pardon me my ignorance from time to time. I have left some of Matt's email in, and near the end I have reacted to some other emails, as hopefully indicated.
>>
>> Currently, it works for OWLS data.  However, since there has been
>> something of a Gadget fragmentation, there is considerable difficulty
>> in "supporting Gadget" when that includes supporting the binary
>> format.
>
> As an aside, I might at some point spend some time on a converter to the owls-like HDF5 format for simulations saved in the gadget binary format (mostly to use the Oppenheimer&Dave simulations together with some stuff I have developed). If I get to that, it may prove useful.

I agree, it definitely would.  Do you think you'd be at all interested
in trying this out from the binary reader I wrote into yt?  It might
be possible to get a flow of data to OWLS-style format.

>
>> Where I'm now stuck is how to proceed with handling SPH data.  By
>> extension, this also applies somewhat to handling tesselation codes
>> like AREPO and PHURBAS.
>>
> As far as my knowledge of AREPO goes, I think using the mesh generating points of the voronoi tessellation (if those are stored), should be pretty much the same as using sph particles, although I am not sure what number of neighbors to use (structure may be smoothed out some).

My understanding is that in fact the tessellation structure is not
stored, but I think we can use a similar (albeit reduced, since it
needs < N_sph neighbors) technique to get the local density estimate
for AREPO data.  In fact, I think that might end up being easier and
faster than the SPH data.

>
>>
>> Analyzing quantitative data, where fields can be generated from
>> fully-local information, is not difficult.  We can in fact largely
>> consider this to be done.  This includes anything that does not
>> require applying a smoothing kernel.
>>
> So, you mean that that only includes DM and stars, right? Any given fluid quantity is only defined by smoothing over N_sph (not that this number os not the same for everybody, and maybe not stored in output, but in order to have quantitative analysis physically consistent with the simulation, this number should be the same in analysis and simulation) neighbors. Extra complication: the smoothing length is not always equal to the distance to the N_sph'th neighbor, as there is a minimum set to the kernel, that is the reason why in OWLS this number is stored in output as well (as Nathan indicated, not every simulation group does this though!).

Yup, DM and stars only.  For the variation in N_sph I think we can
actually have considerable flexibility in how we define both the
smoothing kernel and the free parameters for it.  The minimum length
setting can also provide a bit of assistance for choosing the
neighbors which will be examined for particles.

The biggest stumbling block for me, conceptually, is the best way to
search 26 (or more) oct neighbors efficiently, without storing the
particle data in memory and minimizing the hits to the disk.  I'm
exploring a few options now, but what I think it's going to come down
to is going to require some caching of particle data and clever
ordering of octs in the tree.

>
>>
>>  * Data selection and reading: we can get rectangular prisms and read
>> them from disk relatively quickly.  Same for spheres.  We can also get
>> boolean regions of these and read them from disk.
>
> I am not completely sure what these prisms are (n00b…). ANy fluid quantity inside some volume depends on particles just outside that volume as well. It sounds like that could be an issue.

Ah, sorry, I mean we can easily select particles that exist in a prism
defined by two corners.  As for the particles just outside the volume,
this is where I'm starting to think we may need to stride over prisms
that are somewhat larger than the oct leaf nodes that we want to
examine, where we draw from neighbors as well.

>
>>  * Neighbor search in octrees.  Given Oct A at Level N, we can get
>> all M (<=26) Octs that neighbor A at Levels <= N.
>
> That is great, as long as the octs contain a good fraction of N_sph particles (such that fluid quantities are defined everywhere by the oct in question and its neighbors, and nothing more).

Right now my thought is we can do N_sph * 4 or so in each, and then
inside each oct assume a grid size of say between 8 and 32 cells on a
side, depending.  I know I said earlier I wanted to avoid gridding,
but I think as a first pass we may need to, after reading comments
from both you and Nathan.  (See below.)

>
>>  * Voronoi tesselations: we have a wrapper for Voro++ which is
>> relatively performant, but not amazing.  The DTFE method provides a
>> way to use this to our advantage, although this requires non-local
>> information.  (The naive approach, of getting density by dividing the
>> particle mass by the volume of the voronoi cell, does not give good
>> agreement at least for the OWLS data.)
>>
> Indeed, for the same reasons as above: the volume of a voronoi cell depends only on the nearest few neighbors, always much less than N_sph, whereas fluid quantities in sph are always dependent on all N_sph neighbors. This is also the reason why e.g. arepo and gadget should be handled slightly differently (and is ultimately the reason why moving mesh codes do hydro better).
>
>>
>> == Gridding Data ==
>>  vs.
>> == Not Gridding Data ==
>>
>
> Pro gridding: if you do it before you do anything lee and separately store files with the gridded fluid quantities (disk-space-wise not the cutest solution), you can do it with any resolution you like, as using the same definition as for sph particles, all fluid quantities are defined everywhere in space: take a point, grow a sphere to contain N_sph ngbs and so on…
>
> IF not-gridding indeed is an option (i.e. if we figure out how to define fluid quantities everywhere we need in a physically consistent way) then I would go for that. This may require some brainstorming though; it at least isn't completely obvious to me.
>

It's not obvious to me either the best way to do that.  But I think
that the solutions it will draw on will require much of the same
infrastructure as the gridding solution, since we will ultimately need
to evaluate the smoothing kernel at many different locations anyway.
So I am going to proceed with a gridding step, with the admission that
we hope to get rid of this in the future.  And unlike in the 2.x
series, I don't think we will face as much technical resistance to
that.

>
>
> On Jan 8, 2013, at 4:45 PM, Nathan Goldbaum <nathan12343 at gmail.com> wrote:
>
>> From my point of view its straightforward to smooth the particles onto grids, or, if we avoid gridding, splat the particles onto pixels according to the smoothing kernel.  While there are a number of possible choices for kernel, most codes (Gadget in particular) use various flavors of the cubic spline kernel, which has only one free parameter, the smoothing length, which will be written to disk along with the particle data.
>
> Indeed. Two buts:
> - the contribution of a particle may go to one or more pixels, which requires a nasty integration of a projected kernel over square pixels.
> - the kernel has only one free parameter, and if it is written together with particle data that is great. I not, it needs to be recomputed from the other particles, taking into account that it may be subject to a minimum in high density regions. This is just something to keep in mind, and calculating the kernel for all particles is fairly expensive, but gadget comes with C-routines to do it that are efficient.
>
>

Using the Gadget routines is actually one route I intend to explore.

>> As for whether to grid the data or not, I think that it will be necessary for geometric selection of data.  One could in principle set up profiles over the whole simulations using the capabilities that you've already completed, querying the fluid fields at the particle locations and calculating derived fields based on the fluid fields written to disk at each particle location.  However, it seems to me that calculating fields that require some sort of differencing, or calculating profiles over a geometric object /requires/ a gridding step.
>>
> I am afraid Nathan is right here. If not, I'd be curious to know how to solve this! Especially if field properties are desired somewhere where there is no particle (the particles form an irregular and sometimes pretty sparsely sampled grid!).
>

Okay.  Good point.  This is ultimately the point that convinced me
that we're not (yet) ready for gridless analysis, in that geometric
selection simply works better when you can easily evaluate a value at
every point.  So for now, I am going to go down the gridding path.

Once I have an outline of how to do this (i.e., how to go from the
Octree -- which we have -- to a set of fluid particles) I'll write
back either here or in a YTEP to describe the process, request
feedback, and set about implementing.

>
>
> On Jan 10, 2013, at 4:24 AM, Christopher Moody <cemoody at ucsc.edu> wrote:
>
>> 1. Be able to mandate the full SPH octree ahead of time
>
> +1

Good idea.  Hadn't thought about this.  I don't yet have a
serialization format for the octree, but creating one should not be
too hard.

>
>> 2. Be able to operate over multiple pfs / merge them. How should the particle octree interact with the hydro one when we have both in a single snapshot/instant?
>
>
> Maybe they should not be created independently…?

Hm, this is also possible.  As it stands for RAMSES I was not creating
particle octrees at all yet.

>
>
> So far my input, I'd be happy to discuss this in a hangout, over email, or via whatever platform!

Let's do that.  I need to spend some time thinking this through
further and researching, but I will write back soon and we can set up
a hangout about it.

Thank you all for your input!  I am generally quite excited about
this, and I appreciate the thoughts and guidance.

-Matt

>
>
> Cheers, Marcel
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org