[yt-users] 14-hour load time for Enzo dataset with VR, vs. 30 minutes with ProjectionPlot?

Matthew Turk matthewturk at gmail.com
Sat Mar 5 15:16:25 PST 2016


Hi Stuart,

I've started looking into this, and I've made some progress that may
help for your particular use case.

https://bitbucket.org/yt_analysis/yt/pull-requests/2025

-Matt

On Thu, Mar 3, 2016 at 3:38 PM, Stuart Levy <salevy at illinois.edu> wrote:
> Hello yt people,
>
> We're trying to render imagery of a pretty large Enzo snapshot (~160GB, in
> 330,000 grids in 512 HDF5 domains) with yt-3.3dev.
>
> On a reasonably fast Linux machine, we can do a ProjectionPlot of a few
> variables in about 30 minutes, running single-threaded while it scans the
> data (which is what takes most of the time).   Data access pattern: we see
> it reading through each of the HDF5 files in numerical order (cpu0000,
> cpu0001, ...), taking a few seconds each, and opening each file exactly
> once.
>
> On the same machine and same dataset, using the volume rendering API, the
> data-scanning process takes about 14 hours (not counting any rendering
> time).   (On Blue Waters, Kalina using a similar dataset couldn't get it to
> finish within a 24-hour wall-clock limit.)   Data access pattern: it opens
> an HDF5 file many times in quick succession, then opens another, then opens
> the previous file a bunch more times.  I'm guessing it grabs one AMR grid
> from each HDF5 open:
>
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0235",
> O_RDONLY) = 3
> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
> O_RDONLY) = 3
>
> This is trouble.  Is there anything we can do to make load times less
> extravagant when using VR on Enzo?   What if we ran "ds.index" before
>
> I tried running cProfile on it, as in
>    python -m cProfile myscript.py ...
> Happy to point anyone at the dataset on our systems or BW, but at this scale
> it's not a very portable problem.
>
>
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>



More information about the yt-users mailing list