<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hello yt people,<br>

    <br>

    We're trying to render imagery of a pretty large Enzo snapshot

    (~160GB, in 330,000 grids in 512 HDF5 domains) with yt-3.3dev.<br>

    <br>

    On a reasonably fast Linux machine, we can do a ProjectionPlot of a

    few variables in about 30 minutes, running single-threaded while it

    scans the data (which is what takes most of the time).   Data access

    pattern: we see it reading through each of the HDF5 files in

    numerical order (cpu0000, cpu0001, ...), taking a few seconds each,

    and opening each file exactly once.<br>

    <br>

    On the same machine and same dataset, using the volume rendering

    API, the data-scanning process takes about<b> 14 hours</b> (not

    counting any rendering time).   (On Blue Waters, Kalina using a

    similar dataset couldn't get it to finish within a 24-hour

    wall-clock limit.)   Data access pattern: it opens an HDF5 file many

    times in quick succession, then opens another, then opens the

    previous file a bunch more times.  I'm guessing it grabs one AMR

    grid from each HDF5 open:<br>

    <br>

    <blockquote><tt>open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0235",

        O_RDONLY) = 3<br>

        open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",

        O_RDONLY) = 3<br>

      </tt></blockquote>

    This is trouble.  Is there anything we can do to make load times

    less extravagant when using VR on Enzo?   What if we ran "ds.index"

    before <br>

    <br>

    I tried running cProfile on it, as in<br>

       python -m cProfile myscript.py ... <br>

    Happy to point anyone at the dataset on our systems or BW, but at

    this scale it's not a very portable problem.<br>

    <blockquote><br>

    </blockquote>

  </body>

</html>