[yt-dev] Proposal: Upcast Enzo to 64 bits at IO time

Thu Dec 6 11:30:17 PST 2012

Hi all,

I've been doing some benchmarking of various operations in the Enzo
frontend in yt 2.x.  I don't believe other frontends suffer from this,
for the main reason that they're all 64 bit everywhere.

The test dataset is about ten gigs, with a bunch of grids.  I'm
extracting a surface, which means from a practical standpoint that I'm
filling ghost zones for every grid inside the region of interest.
There are many places in yt that we either upcast to 64-bit floats or
that we assume 64-bits.  Basically, nearly all yt-defined Cython or C
operations assume 64-bit floats.

There's a large quantity of Enzo data out there that is float32 on
disk, which gets passed into yt, where it gets handed around until it
is upcast.  There are two problems here: 1) We have a tendency to use
"astype" instead of "asarray", which means the data is *always*
duplicated.  2) We often do this repeatedly for the same set of grid
data; nowhere is this more true than when generating ghost zones.

So for the dataset I've been working on, ghost zones are a really
intense prospect.  And the call to .astype("float64") actually
completely dominated the operation.  This comes from both copying the
data, as well as casting.  I found two different solutions.

The original code:

g_fields = [grid[field].astype("float64") for field in fields]

This is bad even if you're using float64 data types, since it will
always copy.  So it has to go.  The total runtime for this dataset was
160s, and the most-expensive function was "astype" at 53 seconds.

So as a first step, I inserted a cast to "float64" if the dtype of an
array inside the Enzo IO system was "float32".  This way, all arrays
were upcast automatically.  This led me to see zero performance
improvement.  So I checked further and saw the "always copy" bit in
astype, which I was ignorant of.  This option:

g_fields = [np.asarray(grid[field], "float64") for field in fields]

is much faster, and saves a bunch of time.  But 7 seconds is still
spent inside "np.array", and total runtime is 107.5 seconds.  This
option is the fasted:

        g_fields = []
        for field in fields:
            gf = grid[field]
            if gf.dtype != "float64": gf = gf.astype("float64")
            g_fields.append(gf)

and now total runtime is 95.6 seconds, with the dominant cost *still*
in _get_data_from_grid.  At this point I am much more happy with the
performance, although still quite disappointed, and I'll be doing
line-by-line next to figure out any more micro-optimizations.

Now, the change to _get_data_from_grid *itself* will greatly impact
performance for 64-bit datasets.  But also updating the io.py to
upcast-on-read datasets that are 32-bit will help speed things up
considerably for 32-bit datasets as well.  The downside is that it
will be difficult to get back raw, unmodified 32-bit data from the
grids, rather than 32-bit data that has been cast to 64-bits.

Is this an okay change to make?

[+-1][01]

-Matt