<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hello yt people,<br>
<br>
We're trying to render imagery of a pretty large Enzo snapshot
(~160GB, in 330,000 grids in 512 HDF5 domains) with yt-3.3dev.<br>
<br>
On a reasonably fast Linux machine, we can do a ProjectionPlot of a
few variables in about 30 minutes, running single-threaded while it
scans the data (which is what takes most of the time). Data access
pattern: we see it reading through each of the HDF5 files in
numerical order (cpu0000, cpu0001, ...), taking a few seconds each,
and opening each file exactly once.<br>
<br>
On the same machine and same dataset, using the volume rendering
API, the data-scanning process takes about<b> 14 hours</b> (not
counting any rendering time). (On Blue Waters, Kalina using a
similar dataset couldn't get it to finish within a 24-hour
wall-clock limit.) Data access pattern: it opens an HDF5 file many
times in quick succession, then opens another, then opens the
previous file a bunch more times. I'm guessing it grabs one AMR
grid from each HDF5 open:<br>
<br>
<blockquote><tt>open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0235",
O_RDONLY) = 3<br>
open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
O_RDONLY) = 3<br>
</tt></blockquote>
This is trouble. Is there anything we can do to make load times
less extravagant when using VR on Enzo? What if we ran "ds.index"
before <br>
<br>
I tried running cProfile on it, as in<br>
python -m cProfile myscript.py ... <br>
Happy to point anyone at the dataset on our systems or BW, but at
this scale it's not a very portable problem.<br>
<blockquote><br>
</blockquote>
</body>
</html>