[yt-dev] summary of GDF discussion

Mon Jan 7 07:00:51 PST 2013

Hi all,

On Friday, 4 Jan 2013, we had a hangout to discuss what the next steps to
be taken for the implementation of a C language library for GDF. We call
this library implementation gdfio. For reference, the GDF standard can be
found at

https://bitbucket.org/yt_analysis/yt/src/554d144d9d248c6f70d8c665a5963aa39b2d6bb3/yt/utilities/grid_data_format/docs/gdf_specification.txt?at=yt

One of the biggest issues we discussed was whether or not to rely on
libraries other than HDF5 when writing gdfio. The main argument for doing
so is that none of us are experienced C programmers; thus implementing
things like hashes, linked lists, and so forth might be a barrier to making
progress. The main argument against doing so is that gdfio should be a very
low-level library that can be deployed on many different systems, and
dependencies make this difficult.

In order to best assess what to do, we attempted to identify *what*
non-stdlib C features we would actually need in our implementation. Because
GDF itself does not make any links between grids (only recording their
parents in an optional metadata step), we came to the conclusion that the
only thing we need is a hash table. Even this hash table is optional and
only for reading. For example, you could do something like
gdfio_read_grid(grid_id, "density") and get back an object that includes
both the density data and their associated metadata. We thus decided to
proceed without *any* dependencies aside from HDF5.

Kacper pointed out that the most important issue for efficiency is how to
convert native memory structures into GDF's /data/grid_%010i/ structures
without expensive copies. Native data could be 5D (block, quantity, x, y,
z), 4D, or 3D depending on the code. On this point, Sam noted that the
easiest thing might be to require a "buffer" type interface that gives
(pointer, size) and allows gdfio to grab the requisite number of floats or
doubles. We decided to try this buffer approach first. Essentially, this is
a question of how much gdfio provides to users who will be wrapping it to
write their code's data. For now, we decided to keep gdfio's offerings
minimal. This allows us to see how it will work and what might be the best
additional features to add later.

Casey and Sam both brought up issues of how to parallelize. One issue is
that parallel HDF5 can be quite tricky to deal with, so we decided to forgo
using parallel HDF5 for now. We agreed that the simplest path forward is to
use file links, so that each non-root-IO processor writes its data to a
separate data-only file that is linked back to the main HDF5 file. This
means we need to add an API for creating, writing, and reading from
data-only files.

Finally, we came up with the next step: I (Jeff) will draft an API for
gdfio this week. I will submit it to yt-dev for discussion and iteration.
Once we have an API that looks good, we'll begin coding it up.

If I misrepresented anything from the meeting or GDF, please let me know.
Thanks to all who participated!

j
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20130107/34d55998/attachment.htm>