[yt-users] 14-hour load time for Enzo dataset with VR, vs. 30 minutes with ProjectionPlot?
Stuart Levy
salevy at illinois.edu
Thu Mar 3 13:38:28 PST 2016
Hello yt people,
We're trying to render imagery of a pretty large Enzo snapshot (~160GB,
in 330,000 grids in 512 HDF5 domains) with yt-3.3dev.
On a reasonably fast Linux machine, we can do a ProjectionPlot of a few
variables in about 30 minutes, running single-threaded while it scans
the data (which is what takes most of the time). Data access pattern:
we see it reading through each of the HDF5 files in numerical order
(cpu0000, cpu0001, ...), taking a few seconds each, and opening each
file exactly once.
On the same machine and same dataset, using the volume rendering API,
the data-scanning process takes about*14 hours* (not counting any
rendering time). (On Blue Waters, Kalina using a similar dataset
couldn't get it to finish within a 24-hour wall-clock limit.) Data
access pattern: it opens an HDF5 file many times in quick succession,
then opens another, then opens the previous file a bunch more times.
I'm guessing it grabs one AMR grid from each HDF5 open:
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0235",
O_RDONLY) = 3
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3
This is trouble. Is there anything we can do to make load times less
extravagant when using VR on Enzo? What if we ran "ds.index" before
I tried running cProfile on it, as in
python -m cProfile myscript.py ...
Happy to point anyone at the dataset on our systems or BW, but at this
scale it's not a very portable problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20160303/a44003ca/attachment.htm>
More information about the yt-users
mailing list