[yt-users] 14-hour load time for Enzo dataset with VR, vs. 30 minutes with ProjectionPlot?

Stuart Levy salevy at illinois.edu
Thu Mar 3 13:38:28 PST 2016


Hello yt people,

We're trying to render imagery of a pretty large Enzo snapshot (~160GB, 
in 330,000 grids in 512 HDF5 domains) with yt-3.3dev.

On a reasonably fast Linux machine, we can do a ProjectionPlot of a few 
variables in about 30 minutes, running single-threaded while it scans 
the data (which is what takes most of the time).   Data access pattern: 
we see it reading through each of the HDF5 files in numerical order 
(cpu0000, cpu0001, ...), taking a few seconds each, and opening each 
file exactly once.

On the same machine and same dataset, using the volume rendering API, 
the data-scanning process takes about*14 hours* (not counting any 
rendering time).   (On Blue Waters, Kalina using a similar dataset 
couldn't get it to finish within a 24-hour wall-clock limit.)   Data 
access pattern: it opens an HDF5 file many times in quick succession, 
then opens another, then opens the previous file a bunch more times.  
I'm guessing it grabs one AMR grid from each HDF5 open:

    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0235",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3

This is trouble.  Is there anything we can do to make load times less 
extravagant when using VR on Enzo?   What if we ran "ds.index" before

I tried running cProfile on it, as in
    python -m cProfile myscript.py ...
Happy to point anyone at the dataset on our systems or BW, but at this 
scale it's not a very portable problem.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-users-spacepope.org/attachments/20160303/a44003ca/attachment.htm>


More information about the yt-users mailing list