[Yt-dev] Help with io.py and GDF

Fri Sep 3 16:33:22 PDT 2010

Hi Chris,

(as a quick note, I've pushed some changes in 9f8fb27e2fc1 that I will
reference here.)

>    I've been working on getting writing code
> (http://hg.enzotools.org/gadget_infrastructure/summary) that will
> transform gadget snapshots into the newly-minted gridded data format
> (GDF: http://yt.enzotools.org/wiki/GridDataFormat). The last few days
> I've been writing a very skeletal GDF reader in yt, and I've come
> across a few problems. I pushed changes to yt a few hours ago, and so
> should be up on the hg repository - what I'm talking about is
> referring to files in /frontends/gadget/* .

This is awesome!  Congratulations on this, it's a huge step, and would
be an amazing asset to the community.  I am also delighted you're
using the GDF!

>
> 1. When reading, say, an array of particle data like position_x of
> some grid, (shape is just N, the number of particles in that grid) I
> get array shape errors. It tries to multiply the array by a
> weight_data but that's an array of the same shape as active dimensions
> for the grid (which in my case is [2,2,2] since I'm doing an octree
> refinement) - so it cries when it tries to multiply an N array with a
> (3,2)-shape array. To make that shape make sense, it seems like I want
> not the data for that particular grid, but all of it's children's
> data. Or something, I'm a bit confused. Check out io.py.

Ah!  Yup, I see what's up here -- it's actually because projecting
particles isn't necessary, unless you're projecting a deposited
particle.  I've also changed it so that the particle_type has been set
to true on the relevant fields, to avoid some of these issues.

> 2. Is there going to be a problem with having parent grids with no
> particle data - that are almost empty? The position_x array for such a
> grid won't exist, so I'm not sure what to do in the _read_data_set()
> method. I get grids like this when I subdivide a grid into another 8
> child grids and then all of the particles particles belonging to the
> parent grid are shuffled into the appropriate child bins. The parent
> is then left with none.

For the particles, this shouldn't be an issue.  But if you have any
fluid quantities, they have to be present in all the grids.  Particles
are treated independently, and it shouldn't evne try to read grids
that have no particles.

> 3. What does _read_data_slice() mean when a field like 'position' is
> already scalar-ified as position_x? I thought the slice 'axis'
> would've picked out the 'x' axis in position, but you start off
> already asking for position_x.

This should also be okay for particles.  :)  For fluid fields, it will
be defined.

> I've included an example GDF data file here
> (http://dl.dropbox.com/u/206140/decay_100.gyt.hdf5 - not guaranteed to
> be alive forever) and a pastebin script
> (http://paste.enzotools.org/show/1145/) that highlights my first
> problem.

Quick question -- are you sure this has been correctly created?  I
looked at the grid_left_index, and it's all 0's and 1's.  I should
have been more explicit in the documentation for GDF, but what this
should be is the global index of the grid's starting position; i.e.,
for a level 0 grid, if it started in the upper left corner, this would
be 0,0,0.  But for a level one grid that occupies the bottom right
octant, this would be 2,2,2...  Essentially, this is calculated by
(left_edge - domain.left_edge) / dx, and for a level it can take on
values of 0 .. (refine_by^level * top_grid_dimensions).

Again, thanks for your hard work.  I've almost been able to project a
deposited particle field!  This is exciting.

-Matt