[yt-dev] Turn data serialization off by default

Wed Jul 24 23:55:40 PDT 2013

Hey David,

I don't think you can modify the ytcfg object after loading up yt, so your
second example won't work.

As for your first example, I think that's possible via pickling:

with open('data.pickle', 'wb') as pkl_file:
    s = cPickle.dumps(proj, pkl_file, protocol=-1)

You can then load it later like so:

with open('data.pickle', 'rb') as pkl_file:
    proj = cPickle.load(pkl_file)

You can do similar things using pf.h.save_object() and load_object(), but
in a bit of a chicken and egg situation, you'll need serialization turned
on in your config parameters for that to work.

-Nathan

On Tue, Jul 23, 2013 at 8:03 PM, David Collins <dcollins4096 at gmail.com>wrote:

>
> I'm +1 on changing the default.  Thanks for making a announcement about
> the change.
>
> How hard would it be to make an individual routine get serialized on
> demand?  For instance,
> proj = pf.h.proj( ... serizlize = True)
>
> Or, would it work to do
>
> ytcfg['yt', 'serialize'] = 'True'
> do stuff
> ytcfg['yt', 'serialize'] = 'False'
> ?
>
> d.
>
>
>
> On Tue, Jul 23, 2013 at 6:20 PM, j s oishi <jsoishi at gmail.com> wrote:
>
>> Oh god...+100000000000 <sound of coins dinging in 8 bit glory>
>> On Jul 23, 2013 7:08 PM, "Matthew Turk" <matthewturk at gmail.com> wrote:
>>
>>> On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>>> wrote:
>>> > Hi all,
>>> >
>>> > I've just issued a PR that will hopefully fix a whole class of buggy
>>> > behavior that both new and experienced yt users commonly run into.
>>> > Specifically, I'd like it if we could turn off data serialization by
>>> > default.  This changes a long-lived default value in yt's
>>> configuration, so
>>> > I wanted to bring this change to the attention of both the yt user and
>>> > developer community.
>>> >
>>> > What is data serialization?  Currently, yt will save the result of
>>> certain
>>> > expensive calculations, including projections, the structure of the
>>> grid
>>> > hierarchy, and the list of fields present in the data.  While this
>>> does have
>>> > the beneficial effect of saving time when a user needs to repetitively
>>> > calculate these quantities on the same dataset, it has a number of
>>> features
>>> > which lead to buggy, annoying behavior.
>>> >
>>> > Specifically, If I am developing my simulation code or repeatedly
>>> restarting
>>> > my code, searching for a way to grind past a code crash, I will quite
>>> often
>>> > regenerate the same simulation output file over and over, changing a
>>> line of
>>> > code or switching out the value of a parameter each time.
>>> >
>>> > If yt's data serialization is turned on, it's likely that yt's
>>> > visualizations will correspond to old versions of the data file.
>>>  Since only
>>> > certain operations are serialized, it's also possible for yt to get
>>> into an
>>> > inconsistent state - one operation will show the current data file,
>>> while
>>> > another operation will show an old version.
>>> >
>>> > It's possible to fix a bug in your code, but because yt is still
>>> loading the
>>> > old data, you won't be able to tell that your bug is fixed until you
>>> realize
>>> > that you have .yt and .harrays files littering your filesystem.
>>> >
>>> > I've personally wasted a lot of time due to yt's serialization
>>> 'feature' and
>>> > denizens of our IRC channel and mailing list can attest to how often
>>> new
>>> > users run into this behavior as well.
>>> >
>>> > My pull request only turns off serlialization by default, it doesn't
>>> disable
>>> > the capability completely.  Once the pull request is merged in, you
>>> can turn
>>> > on serialization either by adding an entry to your config file:
>>> >
>>> > $ cat ~/.yt/config
>>> >
>>> > [yt]
>>> > serialize = True
>>> >
>>> > Or on a per-script basis:
>>> >
>>> > from yt.config import ytcfg
>>> > ytcfg['yt', 'serialize'] = 'True'
>>> > from yt.mods import *
>>> >
>>> > The pull request is here:
>>> > https://bitbucket.org/yt_analysis/yt/pull-request/558
>>> >
>>> > I know several of you are big fans of this feature, so if you object
>>> to this
>>> > change please leave a comment on the pull request so we can figure out
>>> a way
>>> > forward.
>>>
>>> I think this is long overdue, for all the reasons you list.
>>> Auto-serialization treated a lot of symptoms that we have since
>>> improved, or that we should address more directly -- speed of
>>> hierarchy construction, saving data that we want to retain, and
>>> detecting fields.
>>>
>>> +1!
>>>
>>> -Matt
>>>
>>> >
>>> > -Nathan
>>> >
>>> > _______________________________________________
>>> > yt-dev mailing list
>>> > yt-dev at lists.spacepope.org
>>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>> >
>>> _______________________________________________
>>> yt-dev mailing list
>>> yt-dev at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>
>>
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>>
>
>
> --
> -- Sent from a computer.
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20130724/a9c03c92/attachment.html>