[yt-dev] RFC: SPH plan in yt 3.0

Tue Jan 8 09:07:06 PST 2013

Hi Elizabeth,

Thanks for the info!  Looks like we could learn from them.

-Matt

On Tue, Jan 8, 2013 at 10:45 AM, Elizabeth Tasker
<tasker at astro1.sci.hokudai.ac.jp> wrote:
> Ah, found it:
>
> http://code.google.com/p/pynbody/
>
> Apparently, they now suppose AMR too. Let's take them out…. uh, I mean, look at their code with interest.
>
>
> Elizabeth
>
>
> On Jan 9, 2013, at 12:41 AM, Elizabeth Tasker <tasker at astro1.sci.hokudai.ac.jp> wrote:
>
>> Hi,
>>
>> Just something to consider:
>>
>> While I was at McMaster, there was quite a bit of talk about a python visualisation tool called 'pin body'. I think this might have been developed by Greg Stinson to work with GASOLINE.
>>
>> I'm unsure how far this progressed; grad students at Mac were definitely using it to create slices but I can't find any mention of it on the web.
>>
>> Potentially, this might be worth checking out, either as a possible incorporation or comparison?
>>
>> I can ask McMaster what the situation is if that would be helpful?
>>
>> As a side line, I'd find this very useful as a way of converting between simulation code formats, combined with it's new initial condition generator.
>>
>> Elizabeth
>>
>>
>> On Jan 9, 2013, at 12:18 AM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am writing today to request comments and suggestions for supporting
>>> SPH data in yt.  This is part of a broader effort -- which includes
>>> the native Octree support -- in yt 3.0 to correctly support
>>> non-gridpatch data.
>>>
>>> This email contains a lot of information about where we are, and I was
>>> hoping to get some feedback from people who have more experience with
>>> SPH data, Gadget, AREPO, etc.  In particular, (if they're around and
>>> have the time) it'd be great to hear from Marcel Haas, Michael J.
>>> Roberts, and Chris Moody.
>>>
>>> Additionally, I am very interested in approaching this as an
>>> iterative, incremental process.  Minimum Viable Product (MVP) first,
>>> then improving as we learn and go along.
>>>
>>> = Background =
>>>
>>> As seen in this pull request:
>>>
>>> https://bitbucket.org/yt_analysis/yt-3.0/pull-request/11/initial-n-body-data-support
>>>
>>> I've implemented a first pass at N-body data support.  This proceeds
>>> through the following steps:
>>>
>>> * Read header
>>> * Create octree
>>> * For each file in output:
>>>   * Add particles to octree
>>>   * When a cell crests a certain number of particles (or when
>>> particles from more than one file are co-located) refine
>>> * Directly query particles to get data.
>>>
>>> As it stands, for N-body data this is very poor at memory
>>> conservation.  In fact, it's extremely poor.  I have however written
>>> it so that in the future, we can move to a distributed memory octree,
>>> which should alleviate difficulties.
>>>
>>> For quantitative analysis of quantities that are *in the output*, this
>>> already works.  It does not yet provide any kind of kernel or density
>>> estimator.  By mandating refinement based on file that the particles
>>> are in, load on demand operations can (more) efficiently read data.
>>>
>>> Many of the related items are outlined in the corresponding YTEP:
>>>
>>> https://yt.readthedocs.org/projects/ytep/en/latest/YTEPs/YTEP-0005.html
>>>
>>> Currently, it works for OWLS data.  However, since there has been
>>> something of a Gadget fragmentation, there is considerable difficulty
>>> in "supporting Gadget" when that includes supporting the binary
>>> format.  I've instead taken the approach of attempting to isolate
>>> nearly all the functionality into classes so that, at worst, someone
>>> could implement a couple functions, supply them to the Gadget
>>> interface, and it will use those to read data from disk.  Some
>>> integration work on this is still to come, but on some level it will
>>> be a 'build your own' system for some particle codes.  OWLS data
>>> (which was run with Gadget) on the other hand is in HDF5, is
>>> self-describing, has a considerable number of metadata items
>>> affiliated with it, and even outputs the Density estimate for the
>>> hydro particles.
>>>
>>> Where I'm now stuck is how to proceed with handling SPH data.  By
>>> extension, this also applies somewhat to handling tesselation codes
>>> like AREPO and PHURBAS.
>>>
>>> = Challenges =
>>>
>>> Analyzing quantitative data, where fields can be generated from
>>> fully-local information, is not difficult.  We can in fact largely
>>> consider this to be done.  This includes anything that does not
>>> require applying a smoothing kernel.
>>>
>>> The hard part comes in in two areas:
>>>
>>> * Processing SPH particles as fluid quantities
>>> * Visualizing results
>>>
>>> Here are a few of our tools:
>>>
>>> * Data selection and reading: we can get rectangular prisms and read
>>> them from disk relatively quickly.  Same for spheres.  We can also get
>>> boolean regions of these and read them from disk.
>>> * Neighbor search in octrees.  Given Oct A at Level N, we can get
>>> all M (<=26) Octs that neighbor A at Levels <= N.
>>> * Voronoi tesselations: we have a wrapper for Voro++ which is
>>> relatively performant, but not amazing.  The DTFE method provides a
>>> way to use this to our advantage, although this requires non-local
>>> information.  (The naive approach, of getting density by dividing the
>>> particle mass by the volume of the voronoi cell, does not give good
>>> agreement at least for the OWLS data.)
>>>
>>> Here's what we want to support:
>>>
>>> * Quantitative analysis (profiles, etc)
>>> * Visualization (Projections, slices, probably not VR yet)
>>> * Speed
>>>
>>> == Gridding Data ==
>>>
>>> In the past, one approach we have explored has been to simply grid all
>>> of the values.  Assuming you have a well-behaved kernel and a large
>>> number of particles, you can calculate at every cell-center the value
>>> of the necessary fields.  If this were done, we could reduce the
>>> complexity of the problem *after* gridding the data, but it could
>>> potentially increase the complexity of the problem *before* gridding
>>> the data.
>>>
>>> For instance, presupposing we took our Octree and created blocks of
>>> 8^3 (like FLASH), we could then analyze the resultant eulerian grid
>>> identically to other codes -- projections, slices, and so on would be
>>> done exactly as elsewhere.
>>>
>>> This does bring with it the challenge of conducting the smoothing
>>> kernel.  Not only are there a handful of free parameters in the
>>> smoothing kernel, but it also requires handling edge effects.
>>> Individual octs are confined to a single domain; neighbors, however,
>>> are not.  So we'd potentially be in a situation where to calculate the
>>> values inside a single Oct, we would have to read from many different
>>> files.  From the standpoint of implementing in yt, though, we could
>>> very simply implement this using the new IO functions in 3.0.  We
>>> already have a _read_particles function, so the _read_fluid function
>>> would identify the necessary region, read the particles, and then
>>> proceed to smooth them.  We would need to ensure that we were not
>>> subject to edge effects (something Daniel Price goes into some detail
>>> about in the SPLASH paper) which may prove tricky.
>>>
>>> It's been impressed upon me numerous times, however, that this is a
>>> sub-optimal solution.  So perhaps we should avoid it.
>>>
>>> == Not Gridding Data ==
>>>
>>> I spent some time thinking about this over the last little while, and
>>> it's no longer obvious to me that we need to grid the data at all even
>>> to live within the yt infrastructure.  What gridded data buys us is a
>>> clear "extent" of fluid influence (i.e., a particle's influence may
>>> extend beyond its center) and an easy way of transforming a fluid into
>>> an image.  The former is hard.  But I think the latter is easier than
>>> I realized before.
>>>
>>> Recent, still percolating changes in how yt handles pixelization and
>>> ray traversal should allow a simple mechanism for swapping out
>>> pixelization functions based on characteristics of the dataset.  A
>>> pixelization function takes a slice or projection data object (which
>>> is, in fact, an *actual* data object and not an image) and then
>>> converts them to an image.  For SPH datasets, we could mandate that
>>> the pixelization object be swapped out for something that can apply an
>>> SPH-appropriate method of pixelization (see Daniel Price's SPLASH
>>> paper, or the GigaPan paper, for a few examples).  We would likely
>>> also want to change out the Projection object, as well, and we may
>>> need to eliminate the ability to avoid IO during the pixelization
>>> procedure (i.e., the project-once-pixelize-many advantage of grid
>>> data).  But I think it would be do-able.
>>>
>>> So we could do this.  I think this is the harder road, but I think it
>>> is also possible.  We would *still* need to come up with a mechanism
>>> for applying the smoothing kernel.
>>>
>>> = Conclusions and Contributions =
>>>
>>> So I think we have two main paths forward -- gridding, and not
>>> gridding.  I would very, very much appreciate comments or feedback on
>>> either of these approaches.  I believe that, regardless of what we
>>> choose to do, we'll need to figure out how to best apply a smoothing
>>> kernel to data; I think this is probably going to be difficult.  But I
>>> am optimistic we are nearing the point of usability.  For pure N-body
>>> data, we are already there -- the trick is now to look at the data as
>>> fluids, not just particles.
>>>
>>> The best way to contribute help right now is to brainstorm and give
>>> feedback.  I'd love to hear ideas, criticisms, flaws in my logic, ways
>>> to improve things, and so on and so forth.  I think once we have an
>>> MVP, we will be better set up to move forward.
>>>
>>> If you'd like to add a reader for your own flavor of particle data, we
>>> should do that as well -- especially if you are keen to start
>>> exploring and experimenting.  The code so far can be found in the
>>> repository linked above.  I'd also appreciate comments (or acceptance)
>>> of the pull request.  I am also happy to hold a Hangout to discuss
>>> this, go over things, help set up readers or whatever.  I consider
>>> this a relatively high-priority project.  It will be a key feature of
>>> yt 3.0.
>>>
>>> (Speaking of which, today I've allotted time to focus on getting the
>>> last major patch-refinement feature done -- covering grids.  Hopefully
>>> patch-refinement people can test it out after that's done!)
>>>
>>> Thanks for your patience with such a long email!  :)
>>>
>>> -Matt
>>> _______________________________________________
>>> yt-dev mailing list
>>> yt-dev at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org