[yt-dev] RFC: (yt-3.0) Geometry handling redesign

Fri Feb 24 13:33:29 PST 2012

Hi Matt,

I think the geometry refactoring will have a huge impact on the
(octree-based) ART frontend, I give it an emphatic +1! I've been trying to
wrap my head around our hacked-together grid-cum-octree process, and it's
been difficult to reconcile both ideas in a natural way. Having a geometry
handler, and two different derived classes for the two architectures is, I
think, much clearer.  I see how this changes the organization of the
frontend, but I'm fuzzy on how this affects data_objects downstream - it
sounds like that needs rewriting?

You also mentioned that yt doesn't support SPH codes without casting
particles as grids in some way - is this improved with the refactoring?

chris

On Fri, Feb 24, 2012 at 11:27 AM, Nathan Goldbaum <goldbaum at ucolick.org>wrote:

> I agree with Casey about the particle UI.  And am +1 on changing the
> particle UI.
>
> It should be easy to programmatically get the list of all particle types
> in a simulation and from there get the list of all particle fields
> associated with that particle type.  The dict of dicts organization
> - obj.particles["PopIICluster"]["position_x"]) - makes the most sense to
> me since you can use particles.keys() and particles["PopIICluster"].keys()
> functions.
>
> Nathan Goldbaum
> Graduate Student
> Astronomy & Astrophysics, UCSC
> goldbaum at ucolick.org
> http://www.ucolick.org/~goldbaum
>
> On Feb 24, 2012, at 10:46 AM, Casey W. Stark wrote:
>
> Great stuff Matt.
>
> +1 on the presented GeometryHandler design. It doesn't sound like it
> changes performance much and it's a nicer organization.
>
> I like the chunking work, but I'm confused about syntax and what goes
> where. Does each data object define its own chunks method? In the chunks
> method, would it be clearer to return something like a "chunk" data object
> instead of self? This probably doesn't matter though, since a user would
> not care about type(ds).
>
> I haven't worked with particle types in yt (it's all DM to me!), but I
> would prefer the `obj.particles["position_x"]` syntax. With the current
> design, I don't like that I have to inspect fields to see if they are
> particle fields or on the grid. This is fine now because I am familiar with
> the fields in my data, but I remember this being confusing as a new user.
>
> For particle type selection, I think syntax like `obj.particles["PopIICluster"]["position_x"]`
> or even `obj.particles.type("PopIICluster")["position_x"]` is clearer. With
> a tuple it almost looks like you are selecting two fields. If we go with
> the tuple though, can we make the particle type the first element? I have
> no idea about mandating a particle type, but we can hash that out in the
> hangout.
>
> Best,
> Casey
>
>
> On Fri, Feb 24, 2012 at 8:12 AM, Matthew Turk <matthewturk at gmail.com>wrote:
>
>> Hi all,
>>
>> This email is going to be long, but the changes I am proposing in it
>> will require feedback.  Next Monday at 1PM Eastern I'll be holding a
>> hangout on Google+, if you would like to join in and discuss it with
>> lower latency.  (Of course, if nobody shows up, that's okay too.)  We
>> don't really have a detailed review process for design decisions like
>> this, and I hope that this discussion will guide developing that kind
>> of review process in the future.
>>
>> Everything described here is in the branch geometry_handling at
>> https://bitbucket.org/MatthewTurk/yt-refactor
>>
>> = Background =
>>
>> Geometry in yt has always been designed around two things:
>>
>> 1) Grid patches
>> 2) Spatial location based on bounding boxes
>>
>> This has presented a number of excellent opportunities for optimizing
>> patch-based AMR data.  However, it has posed challenges for supporting
>> particle-based datasets.  (Particles inside patch-based data are
>> somewhat easier, although if you want to see one of the worst pieces
>> of yt, look no further than FakeGridForParticles, which mocks up
>> particles as grids with varying x,y,z, so that existing
>> fluid-selection routines can be used.)  Furthermore, the state of
>> Octree codes is not good.  For relatively small (up to about 64)
>> domains in RAMSES, we can do analysis with relative ease -- volume
>> rendering, projections, fluid analysis, etc, all work very nicely.
>> However, what is required is a costly step of regridding octs at the
>> outset, to form grid patches with some efficiency.  This regridding is
>> not stored between runs, and it's not *efficient*.
>>
>> This has worked so far, but it is showing its strain.  As it stands,
>> yt cannot directly support SPH codes without somehow making SPH look
>> like grids.  Octree codes can't be supported without making them look
>> like grid codes.  Both of these are very slow operations.  And on top
>> of that, the current state of how data is read in, even for grid
>> codes, is a bit gross.  There are a number of conditionals, mechanisms
>> for striding over data (i.e., "_get_grids()" which is called in
>> parallel, and iterates over grids, and sometimes you have to call
>> _get_point_indices and sometimes you don't), and probably worst of all
>> a lot of unnecessary memory duplication because arrays in
>> regions/spheres/etc are constructed by iterating over grids, filling
>> in a list of arrays, and then concatenating that list into a second
>> array.
>>
>> Furthermore, several times recently the question has come up of
>> handling and supporting more generic geometries: spherical, polar,
>> cylindrical.  Unfortunately, because we're so heavily based around the
>> idea of grids as is, these are difficult to support.  All of our
>> collision detection has been designed with rectilinear surfaces in
>> mind.  That being said, I'm really quite happy with how yt *functions*
>> for patch-based grids.  I'm just not sure that as it stands it is
>> maintainable, and I think that addressing the geometry issues could
>> ultimately also help out a great deal with scalability.  (More on that
>> below.)
>>
>> = Geometry Overhaul =
>>
>> For the last couple weeks, I have been working on an overhaul of the
>> geometry system in yt, attempting to handle parallel striding in a
>> format-agnostic way, deal with generating fields in as simple a way as
>> possible, and also attempting to speed things up.  I'm going to
>> summarize below the changes that I've made, the new design, and the
>> current status.
>>
>> == Design: Selectors ==
>>
>> As it stands, yt data objects (such as Region, Sphere, etc) are mostly
>> thought of as conduits for data.  They apply a criteria to the domain
>> and select all of the fluid points or particles within a region or
>> that fit a specific non-spatial criteria.  I have extended this
>> concept to construct very lightweight objects called selectors,
>> similar to how ParticleIO is now conducted for Enzo.  These are
>> objects in Cython that accept either a set of bounding-box arrays or a
>> set of spatial points and decide if those points are included in a
>> given region.  This is essentially a combination of the
>> get_point_indices and get_cut_mask methods that currently exist.
>> These have all been designed to be GIL-free, so that in the future
>> they can be threaded using OpenMP (although it turns out not very much
>> time is spent on them.)  To date I have implemented spheres, regions,
>> disks, cutting planes, slices, ortho rays, and grid collections.  They
>> all give bitwise identical results to the old method and are almost
>> all substantially faster.  (Cutting planes are particularly better,
>> regions are slightly slower, and the others are all about 25-40%
>> faster.)
>>
>> One primary advantage is that these can (and do) trivially provide
>> counting methods, masking methods, and point-evaluation methods.  So
>> we can use the same routines to select particles and individual points
>> (for collision detection) that we use for selecting grid cells.  This
>> will allow reuse of data object code with particle outputs, and is a
>> primary advantage of the refactor.
>>
>> This also led to separating selection-based data containers from
>> constructed data containers.
>>
>> == Design: Ditching the Hierarchy ==
>>
>> (Hopefully that heading caught some eyes.)  In line with attempting to
>> reduce Enzo-isms in the code, I have been moving away from
>> specifically calling something a "hierarchy," and instead moving
>> toward referring to geometry handlers.  grids, grid_left_edges,
>> grid_right_edges, and so on will no longer be guaranteed to appear
>> hanging off these objects.  The new class structure:
>>
>> GeometryHandler
>> GridGeometryHandler(GeometryHandler)
>> OctreeGeometryHandler(GeometryHandler)
>>
>> and eventually, other methods.  The concept behind this is to isolate
>> those items that *must* require domain knowledge, mesh knowledge, or
>> particle/fluid distribution knowledge, and place those in the
>> GeometryHandler.  All items specific to patch based or block based AMR
>> will go in GridGeometryHandler and so on.  This will allow for
>> fundamentally different method of IO for octrees from grids, for
>> instance.  The geometry handlers are all required to implement the
>> following methods:
>>
>> _read_selection(self, fields, data_object, chunk)
>> _chunk(self, data_object, chunking_style, ngz = 0)
>>
>> I will describe chunks later.  Note that the first routine,
>> _read_selection, accepts the names of the fields, and the data object
>> that describes how to select which portions of a dataset participate
>> in a given selection.  For the specific case of Enzo (as I am still in
>> design phase, I have only been working with Enzo data, although I will
>> move on to FLASH and Boxlib next) read_selection calls a routine that
>> counts and masks grids, batches operations to files, and then returns
>> only the necessary data for a selection.  The data object does no
>> discarding of out-of-bounds or child-masked data.
>>
>> == Design: Chunking ==
>>
>> Currently, yt uses de facto chunking of data in the form of grids.
>> When Derived Quantities want to handle individual grids, one at a
>> time, they "preload" the data from whatever grids the
>> ParallelAnalysisInterface thinks they deserve.  These grids are
>> iterated over, and handled individually, then the result is combined
>> at the end.  Profiles do something similar.  However, both of these
>> are de facto, and not really designed.  They rely on calls to
>> semi-private functions on data objects, manually masking data, on and
>> on.
>>
>> To get around this ugly spot, I have started working on redesigning IO
>> and data handling to function in "chunks."  In this method, data
>> objects will request that the geometry handler supply a set of chunks
>> (the function _chunk, described above, conducts this task.)  Chunks
>> are of the form (IO_unit, size), where IO_unit is only ever managed or
>> handled by _read_selection, above.
>>
>> To start the Chunks shuffling over the output, the code calls
>> data_source.chunks(fields, chunking_style).  Right now only "spatial",
>> "io" and "all" are supported for chunking styles.  This corresponds to
>> spatially-oriented division, IO-conserving, and all-at-once (not
>> usually relevant.)  What happens next is kind of fun.  The chunks
>> function looks like this:
>>
>>    def chunks(self, fields, chunking_style, **kwargs):
>>        for chunk in self.hierarchy._chunk(self, chunking_style, **kwargs):
>>            with self._chunked_read(chunk):
>>                self.get_data(fields)
>>                yield self
>>
>> Note what it does here -- it actually yields *itself*.  However,
>> inside the chunked_read function, what happens is that the attributes
>> corresponding to the size, the current data source, and so on, are set
>> by the geometry handler (still called a hierarchy here.)  So, for
>> instance, execution might look like this:
>>
>> for ds in my_obj.chunks(["Density"], "spatial"):
>>  print ds is my_obj
>>  print ds["Density"].size
>>
>> The first line will actually print True, but the results from the
>> second one will be the size of (for instance) the grid it's currently
>> iterating over.  In this way, it becomes much easier to stride over
>> subsets of data.  Derived quantities now look like this:
>>
>>        chunks = self._data_source.chunks([], chunking_style="io")
>>        for ds in parallel_objects(chunks, -1):
>>            rv = self.func(ds, *args, **kwargs)
>>
>> It chunks things, evaluates and then stores intermediate results.
>>
>> This is not meant to replace spatial decomposition in parallel jobs,
>> but it *is* designed to enable much easier and *mesh-neutral* division
>> of labor for parallelism and for IO.  If we were to call chunk on an
>> octree, it no longer has to make things look like grids; it just makes
>> them look like flattened arrays (unless you chunk over spatial, which
>> I haven't gotten into yet.)
>>
>> Essentially, by making the method of subsetting and striding over
>> subsetted data more compartmentalized, the code becomes more clear and
>> more maintainable.
>>
>> = Status =
>>
>> As it stands, I am able to select data very nicely -- this means
>> slices, profiles, derived quantities all work.  However (see below)
>> particles still need some work and handling, and constructed data
>> objects (smoothed covering grids, projections) have yet to be ported.
>>
>> = Open Questions =
>>
>> However, the main reason I am writing -- when the code is not yet
>> fully functional! -- is to ask for advice and consideration on a few
>> points.
>>
>> == Particle Types ==
>>
>> In the past, we've talked about handling particle types differently
>> depending on the particle.  Because this is destined for yt-3.0, I
>> would like to propose that we break compatibility with old scripts in
>> favor of clarity in our particle types.  Specifically, rather than
>> doing:
>>
>> obj["particle_position_x"][obj["particle_type"] == 1]
>>
>> We do,
>>
>> obj[("particle_positions_x", 1)]
>>
>> (or obj[("particle_positions_x", "PopIICluster")] if you are using a
>> code that supports such expression.)
>>
>> And if a field is found to be a particle type field, but without a
>> tuple, we simply raise a KeyError.  We can emulate old behavior with:
>>
>> obj[("particle_position_x",None)]
>>
>> But I am inclined to say that I would rather *mandate* particle type
>> selection, rather than merely allow it.  This will lead to much
>> greater clarity, I think.
>>
>> [+-][01] on
>>
>> == Particle Access ==
>>
>> Currently, we access particles off a data_object:
>>
>> sphere["particle_position_x"]
>>
>> Do we want to keep this, or move to being much more explicit, that we
>> require sphere.particles["particle_position_x"] ?  I can see both
>> sides to this.  Explicit is good, and would allow us to be both more
>> clear with users and about fields (and with our IO methods) but
>> allowing old behavior encourages not having to necessarily worry about
>> if something is a fluid or a particle, which is a good thing I think.
>>
>> == User Survey ==
>>
>> Should we conduct a brief (only a handful of questions) polling users
>> for their feelings on the changes or potential changes to
>> nomenclature, particles, and so on?  We could use it also as a time to
>> identify other outstanding issues with yt.
>>
>> = Conclusion =
>>
>> Does this all make a sort of sense?
>>
>> [+-][01] on the design presented so far
>>
>> If you're interested in helping out, please get in touch.  I'm still
>> laying groundwork in some areas, but in other areas I would appreciate
>> help and advice moving forward.
>>
>> Thanks to everyone who made it this far,
>>
>> Matt
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>
> !DSPAM:10175,4f47db1f213559955021!
> _______________________________________________
>
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
> !DSPAM:10175,4f47db1f213559955021!
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20120224/a7b55958/attachment.htm>