[yt-dev] Pre-YTEP proposal: eliminate a global mesh for particles

Fri May 27 16:34:05 PDT 2016

On Friday, May 27, 2016, Matthew Turk <matthewturk at gmail.com> wrote:

> Hi all,
>
> As some of you know, Meagan Lang and I have been working on a new
> system for indexing particles quickly.  I'm writing an email today as
> a pre-YTEP phase of getting feedback before moving forward on a plan
> of action.
>
> As of yt 3.3 (dev), the way particles work is as follows:
>
>  * During ingest, build a global octree mesh.  (This is problem #0.)
> This requires loading a uint64 into memory for every single particle,
> sorting it, and then building an octree.  The octree construction is
> pretty fast; the sorting is one of the slower parts, and a uint64 for
> every particle is a pretty bad limiter.  (1024^3 => 8 gigs for the
> array, then sorting.)
>  * A `regions` object is stored on the index, which is of dimensions
> (min(N_data_files,256))^3.  This stores a bitmask of which files fall
> into each cell in that conceptual unigrid object.
>  * The `regions` object gets queried whenever an object is selected,
> and then it gets used to identify which data files should be loaded.
>  * Whenever smoothing happens, 100% of the particles from the files
> (for the data object in question) are loaded into memory at once.
> These are queried, etc, through a "gather" method.  Each mesh point
> that is being evaluated finds its neighbors, then smoothes onto
> itself.  If you make a projection of the full data set, with a
> standard `ProjectionPlot`, all of these things happen.
>  * Parallelism does not help SPH datasets for any smoothing operation
> or any `spatial` chunk operation.  Parallelism does not help the
> indexing operation.
>
> This is problematic in a number of ways; the biggest, most obvious way
> is that every time the dataset is loaded, a mesh octree is made, this
> takes up a ton of RAM, and it only gets used for smoothing deposition
> operations.  And, it's slow.
>
> And one thing I want to make very clear at this point: using it in
> this way is not ideal.  This mesh octree is supposed to provide an
> estimate of the density in a given location; then we smooth onto that.
> This preserves the "project once, reuse many" idea that we've had in
> yt in the past.  (For that particular purpose, it's fine.)  But, since
> we use an MX Octree, which by its nature is not balanced by particle
> count, we end up with empty zones.  Furthermore, during smoothing
> operations, we can run into cases where we under-find particles.
> (This can result in bad results for smoothing; one solution I'm
> exploring for this is using a balanced tree like a kD-tree, which is
> the obvious solution that I'm embarrassed I haven't explored before.)
> We aren't being as true to the data as we should be when we do this.
> Plus, if we do this for projections, we're making a *ton* more work
> than we need to.  We build an octree, smooth onto the octree, then
> immediately collapse it to a quad tree.  (Yes, really.)
>
> Meagan has been pushing forward work to implement a fast,
> memory-conservative indexing system.  It works, and it works really
> well, for indexing specific regions of the fileset that make up a
> dataset.  The idea is to develop a hierarchical indexing system that
> does not presuppose there is any sorting of particles within the
> files, but which can benefit from such a sorting.  It creates two
> levels of compressed bitmap indices, one for each data file (or
> sub-chunk of a data file), where the second level is sparse and only
> exists in regions where more than one data file has particles.
>
> What we've discovered when attempting to apply this to smoothing, is
> that the biggest problem comes from the notion that there is a single,
> fiducial "mesh" that exists everywhere.  What I mean by that is that
> if such a mesh existed, we could chunk over each file in whatever
> order we want, applying particles and smoothing them and whatnot.
> Generating such a mesh in a memory efficient way is very difficult,
> and we have yet to identify a mechanism by which we can do that
> without resulting in somewhere between N and N^2 reads of files for
> every smoothing operation.  Either we have a single, global mesh
> (which, unless we're wrong, requires either multiple passes over files
> until convergence *or* a full in-memory set of particles) or we do
> lots of over-reads to make sure that the mesh isn't dependent on which
> file we are reading and when.  An issue with this which comes up
> immediately is that if we don't have a fiducial mesh, then depositing
> a particle or smoothing a particle will give different results (which
> cannot be combined in a well-defined way) depending on the order it is
> done.
>
> What Meagan's built gives us a fast index, as well as an estimate of
> where regions of high particle count can be found; it further can give
> us a really good idea of how to build a quadtree that over-resolves
> structure.  (Comes in handy below.)
>
> I propose we eliminate the notion of a fixed global mesh for particles.
>
> There are a few places in the code that *right now* assume a global mesh:
>
>  * Projection and slice plots of "smoothed" quantities
>  * Smoothing operations that are not on user-defined (`arbitrary_grid`)
> meshes
>  * Volume renderings (which don't really work anyway)
>  * Ray tracing individual rays (particularly important for light ray
> and trident)
>
> I'll address each of these individually, but I want to start with this
> statement: we should not make assumptions about global meshes, but we
> should make such assumptions and decisions *available* to the people
> using the code.  This means, for instance, that we should continue to
> implement and make available smoothing operations, but we expose them
> through an `arbitrary_grid` object.
>
> Projections and slices of non-local particle quantities (i.e.,
> density, temperature, etc, of gas particles) are presently handled by
> smoothing onto a mesh.  We should instead be doing some method of
> pixelizing these directly, either onto a quadtree that can be reused,
> or onto a buffer directly.  Ting-Wai To, an undergrad at Illinois, has
> built an implementation in Cython of a flat pixelizer that's
> reasonably fast.  What I would propose is using the quad tree mesh
> found during the indexing step, and then smooth over the particles
> onto this mesh directly.  This means we can use *just* an `io` chunk,
> rather than (implicitly) a `spatial` chunk for SPH particles when
> smoothing them.  Integrating the overall SPH smoothing into the
> visualization layer, rather than the data access layer, also means
> that we can rotate particles more easily.
>
> Smoothing operations that are not on user-defined meshes, such as
> computing the total density inside a sphere, are currently using the
> same method as above.  Two main uses of this happen -- computing some
> quantity and then doing a radial profile of it, and computing a
> density or something and comparing it to another quantity.  However,
> it is becoming increasingly clear that this is not what most people
> using SPH codes *want*.  It is not a particularly bad approximation as
> long as the resolution is high enough, but it's also not particularly
> good, and it is lossy.  Instead, we should be regarding the particles
> as particles, and computing profiles on *those*.  This could cause
> issues in regimes where a particle crosses bin boundaries when the
> mesh point doesn't, but I would assert that this is an ill-posed
> operation to begin with.
>
> Volume rendering at present is problematic.  An octree is really the
> only way we *could* do things, but it's also not terribly good for it.
> We'd be smoothing, volume rendering, then visualizing.  The only
> reason we do it this way is because we only have a VolumeSource, and
> no SmoothedParticleSource.  We should instead implement a
> SmoothedParticleSource, which would itself be considerably faster and
> easier to implement, I would wager, than forcing the issue in the
> opposite direction.  There are remnants of this in the code from star
> particle rendering which could be used in this case.
>
> Finally, and the one I'm the most worried about, is ray tracing of
> individual rays.  We need to ensure that we are able to support the
> needs of communities that build on yt functionality, such as trident.
> I believe, but may be incorrect, there are essentially two modes we
> need to support here.  The first is specific query points, and the
> other is aggregate sums.  Query points are where you evaluate the
> value of some quantity at multiple points along a ray running from A
> .. B, and aggregates are where you evaluate the sum total of some
> value along the ray.
>
> In the former case, for query points, we actually can rely on existing
> functionality rather nicely.  We would be able to specify points (or
> even implicitly choose them to be at some regular interval or
> something), evaluate smoothing kernel at those points, and return
> those values.  This is identical to how it operates now, except that
> the points would need to be specified in some manner rather than
> simply being the locations that the ray traverses an oct cell.  I
> believe this would improve the fidelity of the calculation.
>
> The latter case, for aggregation, could build on work such as that
> done by SPLASH for quickly evaluating integrals along lines of sight.
> This would also result in a better conservation of quantities, as the
> integral would have sample points defined by the extent of the
> smoothing kernel, rather than wherever the particle happened to be
> compared to the mesh points.
>
> The fallback operation in all of these is to construct an
> `arbitrary_grid` object and smooth or deposit there.  And, we will
> still make available the ability to generate a "full" octree for when
> you really need one; this will just no longer be the default, and to
> get at it you might have to work a little.
>
> I am of the belief, after having been convinced by compelling evidence
> around performance, memory use and data fidelity, that we should do
> away with a single global mesh and instead move to having fast point
> selection and indexing, coupled with pushing the data representation
> close to the real data instead of faking it to look like particles.
> This is probably the most invasive change, especially as it undoes a
> lot of things I worked really hard on implementing and optimizing, but
> I think if we continue to do things as they are, we're just going to
> be stuck.  We could also evaluate having a global mesh as an option,
> but for deposition/smoothing, it's not going to be guaranteed to be
> optimal or correct.
>
> I'm not going to propose any syntactic sugar or anything of the like
> in this email, although that certainly can/should come up in the YTEP.
> (For instance, .grid() on a data object, perhaps?  But we also already
> have ds.r[] to get arbitrary grids, too!)  But, I wanted to send this
> to the list so that I can get a sense of any *very* broad brush
> feedback before moving forward with a YTEP, and before we start
> adjusting the code (in advance of a PR) to move in this direction.
> I'm certainly not proposing we do this for 3.3 (or rather, that we
> hold 3.3 for this.)  If anyone has any comments, opinions (strong or
> fleeting), suggestions, or if you maybe know how to fix all this stuff
> in a way that Meagan and I haven't thought of, we'd be happy to hear
> them.

Just a few initial comments after reading this over:

This is a big change, and will require a good amount of work to rework yt
internals after merging Meagan's work in. If we decide to do it, it will
require enough work that it may be worth thinking about doing it as part of
yt 4.0, which we can branch off after 3.3 is released. This would also free
us to think about other large changes, like reworking the field system to
be more friendly to code generation, symbolic definition of fields,
and acceleration of field computation via pipelining, as well as splitting
off analysis modules into subprojects outside yt.

We need to be careful about changing things without breaking functionality
or providing clear migration paths if something needs to be removed. This
means expanding test coverage, and identifying functionality that will
break without some refactoring to use new per-particle focused data
processing algorithms. I think an ideal way to handle this is to adapt
existing interfaces to use new particle-focused data analysis rather than
mesh-focused analysis. Just as an example, using your radial profile
example, we could make it so the ProfileND class silently delegates to a
particle-based or SPH-based profiling algorithm for a particle field.
Ideally, we want to present the same high-level interfaces to users of
grid, particle, and unstructured mesh codes as much as possible.

No matter what, we need to think carefully about this and try to avoid
biting off more than we can chew as a community or as individual
contributors. At the same time, we need to be careful to keep breaking
changes to a minimum  - I don't want to see a repeat of the mixed reactions
to the early yt 3.x releases due to broken or missing functionality that
didn't get ported from the 2.x codebase.

>
> Thanks for reading this far,
>
> Matt
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org <javascript:;>
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20160527/0a261ee1/attachment-0001.htm>