[yt-dev] Proposal: Upcast Enzo to 64 bits at IO time
Matthew Turk
matthewturk at gmail.com
Thu Dec 6 11:50:20 PST 2012
On Thu, Dec 6, 2012 at 2:44 PM, Nathan Goldbaum <nathan12343 at gmail.com> wrote:
> Pardon my ignorance, but is the case that computations done in 64 bit mode in enzo are normally saved to disk as 32 bit floats? If so, is there a setting I can change to make sure that my enzo datasets are always written to disk with double precision?
I disabled that particular anti-feature some time ago, with the
"New_Grid_WriteGrid.C" stuff. Now in Enzo, you write to disk exactly
what you store in memory.
>
> Since most enzo calculations are done in 64 bit anyway and this change allows some pretty significant speedups, I'm +1 on this change.
>
> On Dec 6, 2012, at 11:30 AM, Matthew Turk wrote:
>
>> Hi all,
>>
>> I've been doing some benchmarking of various operations in the Enzo
>> frontend in yt 2.x. I don't believe other frontends suffer from this,
>> for the main reason that they're all 64 bit everywhere.
>>
>> The test dataset is about ten gigs, with a bunch of grids. I'm
>> extracting a surface, which means from a practical standpoint that I'm
>> filling ghost zones for every grid inside the region of interest.
>> There are many places in yt that we either upcast to 64-bit floats or
>> that we assume 64-bits. Basically, nearly all yt-defined Cython or C
>> operations assume 64-bit floats.
>>
>> There's a large quantity of Enzo data out there that is float32 on
>> disk, which gets passed into yt, where it gets handed around until it
>> is upcast. There are two problems here: 1) We have a tendency to use
>> "astype" instead of "asarray", which means the data is *always*
>> duplicated. 2) We often do this repeatedly for the same set of grid
>> data; nowhere is this more true than when generating ghost zones.
>>
>> So for the dataset I've been working on, ghost zones are a really
>> intense prospect. And the call to .astype("float64") actually
>> completely dominated the operation. This comes from both copying the
>> data, as well as casting. I found two different solutions.
>>
>> The original code:
>>
>> g_fields = [grid[field].astype("float64") for field in fields]
>>
>> This is bad even if you're using float64 data types, since it will
>> always copy. So it has to go. The total runtime for this dataset was
>> 160s, and the most-expensive function was "astype" at 53 seconds.
>>
>> So as a first step, I inserted a cast to "float64" if the dtype of an
>> array inside the Enzo IO system was "float32". This way, all arrays
>> were upcast automatically. This led me to see zero performance
>> improvement. So I checked further and saw the "always copy" bit in
>> astype, which I was ignorant of. This option:
>>
>> g_fields = [np.asarray(grid[field], "float64") for field in fields]
>>
>> is much faster, and saves a bunch of time. But 7 seconds is still
>> spent inside "np.array", and total runtime is 107.5 seconds. This
>> option is the fasted:
>>
>> g_fields = []
>> for field in fields:
>> gf = grid[field]
>> if gf.dtype != "float64": gf = gf.astype("float64")
>> g_fields.append(gf)
>>
>> and now total runtime is 95.6 seconds, with the dominant cost *still*
>> in _get_data_from_grid. At this point I am much more happy with the
>> performance, although still quite disappointed, and I'll be doing
>> line-by-line next to figure out any more micro-optimizations.
>>
>> Now, the change to _get_data_from_grid *itself* will greatly impact
>> performance for 64-bit datasets. But also updating the io.py to
>> upcast-on-read datasets that are 32-bit will help speed things up
>> considerably for 32-bit datasets as well. The downside is that it
>> will be difficult to get back raw, unmodified 32-bit data from the
>> grids, rather than 32-bit data that has been cast to 64-bits.
>>
>> Is this an okay change to make?
>>
>> [+-1][01]
>>
>> -Matt
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
More information about the yt-dev
mailing list