[Yt-dev] precision in io

Matthew Turk matthewturk at gmail.com
Tue Aug 16 07:44:10 PDT 2011


Hi Geoffrey,

On Mon, Aug 15, 2011 at 10:19 PM, Geoffrey So <gsiisg at gmail.com> wrote:
> I am all for binary, especially since storage is a big problem for some of
> the bigger simulations.
>
> I am just curious, though, which binary are the other simulation groups
> pushing for?  Is there a consistent way of storing binary data in YT?  I am
> asking because currently sometimes I pickle my own data or store them in
> hdf5 .h5 binary or in the .yt files, and I didn't realize the different
> binaries are not compatible until recently.

I can take a moment to explain the difference between pickling and the
binary formats in yt, and why right now the .yt files use pickle
occasionally.

The .yt files utilize two different methods of storage.  For
projections, individual fields from the .data attribute of the
projection are stored such that they can be reproduced external to
Python.  However, generic objects are stored as pickles.  Pickles are
bytecode interpretations of how to reconstruct a Python object's
state; they describe the various attributes, but more importantly,
they include information about how to import necessary modules.  This
works very well for simple objects; in yt, it is assumed that
unprocessed data persists between calls, and is easy to recreate.
(Projections, which can take a while, are a natural exception to this
rule.  They are not serialized using pickle, but using a manual
reconstruction method defined in the object.)

So the question then comes up, when you store a "Sphere" object, what
exactly are you storing if you assume the loaded data is cheap to
recreate?  You store the field parameters, the radius, the center, and
the parameter file off which it hangs.  This naturally falls to a
pickle.  This is described in some more detail inside the ApJS paper.

For naturally array-based objects, if you use store_data, it will
pickle them, and then store the string as an HDF5 array.  This is not
naturally compatible with reading in external to yt; however, if you
assume that there's going to be some handshake between you and the
other user, you would have to agree on a format anyway, so you should
take it upon yourself to store it in that format.  To your question
about what standards the other groups are supporting, I hope to have a
better answer later this week.  (Last week I was at a workshop on
large data sets, and we came up with a few ideas which we're still
iterating on.)

For more info on pickle: http://nadiana.com/python-pickle-insecure and
the included links

>
> Are we looking for a universal YT way of storing binaries?  One big
> advantage would be everyone being able to read the binary if we all store
> them the same way, and I'm sure there's many other advantages.  But the
> headache is everyone agreeing to store them whichever way.

I don't understand "storing binaries."  Is there a gigantic set of
objects that we want to store to access external to yt, or is it just
halos and merger trees and (maybe) projections?  (I would argue that
storing adaptive projections for use outside of yt is not necessarily
productive.)  I think an important question to address is, who is our
audience with this?  Others may disagree, but my development time is
somewhat limited, and my priority is not to interoperate with people
who would rather plot images in IDL, for instance.  I'm not strictly
opposed to their ability to do this, I just don't think that we should
focus on that rather than attempt to provide the best analysis
environment possible.

That being said, however, there is a (very early) skunkworks project
going on to create a data sharing service which will require binary
serialization of most yt objects, independent of the parameter file.
This will ultimately require binary serialization, but it is not being
designed for wide interop or for massive sets of standards.  Just very
simple serve-n-share, where pickles are not necessarily the best way
of passing data.

-Matt

> From
> G.S.
> On Mon, Aug 15, 2011 at 6:02 AM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>> Hi Stephen and Geoffrey,
>>
>> I would prefer we stick with the longer IO output.  The reason is not
>> as much that we believe that a halo truly does exist with that
>> specified precision, but to do our very best to ensure that we
>> communicate between sessions the precise location.  This may also come
>> into play with very high precision runs.
>>
>> My personal preference would be to utilize an all-binary storage
>> format as our *primary* storage format and then allow ASCII for
>> secondary, caveat emptor purposes.  I believe that both the IRATE
>> group and the Galacticus group are pushing forward with halo
>> cataloging methods that will be binary.
>>
>> -Matt
>>
>> On Sun, Aug 14, 2011 at 9:24 AM, Stephen Skory <s at skory.us> wrote:
>> > Hi all,
>> >
>> >> With the current setting, the halo attributes
>> >> are outputted with 9 decimal points, but the ellipsoid parameters
>> >> determined
>> >> using the particle's position (when the data is 64 bit) has 16
>> >> decimals.
>> >
>> > just to clarify, what I've done is to add the option to the
>> > halos.write_out() function (that outputs the HopAnalysis.out file) to
>> > add 5 or so extra columns for the ellipsoid information. So what
>> > Geoffrey is thinking about is increasing the precision of all the
>> > floats in that text file.
>> >
>> > --
>> > Stephen Skory
>> > s at skory.us
>> > http://stephenskory.com/
>> > 510.621.3687 (google voice)
>> > _______________________________________________
>> > Yt-dev mailing list
>> > Yt-dev at lists.spacepope.org
>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> _______________________________________________
>> Yt-dev mailing list
>> Yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>



More information about the yt-dev mailing list