[yt-dev] A set of benchmarks

Tue May 3 20:48:46 PDT 2016

Hi yt-dev,

I am developing a frontend for Einstein Toolkit for yt and in the 
process I generated some crude, preliminary benchmarks which I thought I 
would share in case anybody is interested.

I performed three tests:

 1. I just load the dataset and calculate (say) the maximum of some
    quantity on each grid.
 2. I load in the dataset and calculate the maximum of the magnitude of
    a gradient on each grid. This requires the generation of ghost zones
    at grid boundaries.
 3. I load in the dataset and perform a volume rendering. I make a
    "movie" with 4 frames where I rotate around the volume.

Salient details:

  * I performed these tests with the attached scripts. I turned openmp
    and MPI off (set OMP_NUM_THREADS=1).
  * The domain sizes were: 64^3, 128^3, 256^3, and 512^3.
  * All four datasets have only one refinement level. However the 512^3
    dataset has multiple grids on that refinement level.
  * The size of a dataset ranges from about 10MB for the 64^3 set to
    10GB for the 512^3
  * All data was performed on a single modern workstation. Details:
    2.6GHz clock with AVX2 instructions and a 60MB L3 cache.

Results:

  * Reading in the data is (for the datasets I tested) extremely fast.
    The 512^3 dataset takes only about 20 seconds to read the whole
    thing in.
  * Generating the first frame in a volume render is extremely slow. On
    the order of 4 minutes for the 512^3 data set. After the first frame
    is produced, new frames are fast, even with openmp off. With 1
    openmp thread, it takes on the order of 10s of seconds for a new frame.
  * Generating ghost zones is very fast for datasets with only one grid.
    It is incredibly slow for datasets with multiple grids, dominating
    the run time.
  * I attach a plot comparing the three tests.

Naively, it seems to me that there must be several sources of overhead 
when I perform volume rendering that are significantly more costly than 
simply reading in the data (which seems to be fast). Clearly one of 
these is the generation of the ghost zones, and another is the actual 
ray tracing (although the ray tracing itself seems to be quite fast.). 
However I'm not sure that these two operations alone explain the cost of 
the volume rendering.

I'd love to know other people's experience with benchmarking. Do these 
costs seem normal, up to an order of magnitude or so? Would you have any 
insight into what contributes to the cost of generating the first frame 
when volume rendering?

Thanks very much!

Best,
Jonah Miller

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20160503/a1a59fd5/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scaling-hot-cache-simple-1.pdf
Type: application/pdf
Size: 14190 bytes
Desc: not available
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20160503/a1a59fd5/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: yt-test-ghost-zones.py
Type: text/x-python
Size: 2564 bytes
Desc: not available
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20160503/a1a59fd5/attachment-0003.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: yt-test-io.py
Type: text/x-python
Size: 2387 bytes
Desc: not available
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20160503/a1a59fd5/attachment-0004.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: yt-test-vr.py
Type: text/x-python
Size: 2612 bytes
Desc: not available
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20160503/a1a59fd5/attachment-0005.py>