[yt-users] 14-hour load time for Enzo dataset with VR, vs. 30 minutes with ProjectionPlot?

Stuart Levy salevy at illinois.edu
Sun Mar 6 21:38:31 PST 2016


Thank you so much, Matt!

I've been running some tests today with your new code on the full data 
set - will let you know how they turn out.

On 3/5/16 5:16 PM, Matthew Turk wrote:
> Hi Stuart,
>
> I've started looking into this, and I've made some progress that may
> help for your particular use case.
>
> https://bitbucket.org/yt_analysis/yt/pull-requests/2025
>
> -Matt
>
> On Thu, Mar 3, 2016 at 3:38 PM, Stuart Levy <salevy at illinois.edu> wrote:
>> Hello yt people,
>>
>> We're trying to render imagery of a pretty large Enzo snapshot (~160GB, in
>> 330,000 grids in 512 HDF5 domains) with yt-3.3dev.
>>
>> On a reasonably fast Linux machine, we can do a ProjectionPlot of a few
>> variables in about 30 minutes, running single-threaded while it scans the
>> data (which is what takes most of the time).   Data access pattern: we see
>> it reading through each of the HDF5 files in numerical order (cpu0000,
>> cpu0001, ...), taking a few seconds each, and opening each file exactly
>> once.
>>
>> On the same machine and same dataset, using the volume rendering API, the
>> data-scanning process takes about 14 hours (not counting any rendering
>> time).   (On Blue Waters, Kalina using a similar dataset couldn't get it to
>> finish within a 24-hour wall-clock limit.)   Data access pattern: it opens
>> an HDF5 file many times in quick succession, then opens another, then opens
>> the previous file a bunch more times.  I'm guessing it grabs one AMR grid
>> from each HDF5 open:
>>
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0235",
>> O_RDONLY) = 3
>> open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
>> O_RDONLY) = 3
>>
>> This is trouble.  Is there anything we can do to make load times less
>> extravagant when using VR on Enzo?   What if we ran "ds.index" before
>>
>> I tried running cProfile on it, as in
>>     python -m cProfile myscript.py ...
>> Happy to point anyone at the dataset on our systems or BW, but at this scale
>> it's not a very portable problem.
>>
>>
>>
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org




More information about the yt-users mailing list