[yt-dev] A set of benchmarks

Fri May 6 08:58:10 PDT 2016

Hi Jonah,

Thanks for supplying these.  I've reviewed them but am still processing.

On Tue, May 3, 2016 at 10:48 PM, Jonah Miller <
jonah.maxwell.miller at gmail.com> wrote:

> Hi yt-dev,
>
> I am developing a frontend for Einstein Toolkit for yt and in the process
> I generated some crude, preliminary benchmarks which I thought I would
> share in case anybody is interested.
>
> I performed three tests:
>
>    1. I just load the dataset and calculate (say) the maximum of some
>    quantity on each grid.
>    2. I load in the dataset and calculate the maximum of the magnitude of
>    a gradient on each grid. This requires the generation of ghost zones at
>    grid boundaries.
>    3. I load in the dataset and perform a volume rendering. I make a
>    "movie" with 4 frames where I rotate around the volume.
>
> Salient details:
>
>    - I performed these tests with the attached scripts. I turned openmp
>    and MPI off (set OMP_NUM_THREADS=1).
>    - The domain sizes were: 64^3, 128^3, 256^3, and 512^3.
>    - All four datasets have only one refinement level. However the 512^3
>    dataset has multiple grids on that refinement level.
>    - The size of a dataset ranges from about 10MB for the 64^3 set to
>    10GB for the 512^3
>    - All data was performed on a single modern workstation. Details:
>    2.6GHz clock with AVX2 instructions and a 60MB L3 cache.
>
> Results:
>
>    - Reading in the data is (for the datasets I tested) extremely fast.
>    The 512^3 dataset takes only about 20 seconds to read the whole thing in.
>
>
That's good news!

>
>    - Generating the first frame in a volume render is extremely slow. On
>    the order of 4 minutes for the 512^3 data set. After the first frame is
>    produced, new frames are fast, even with openmp off. With 1 openmp thread,
>    it takes on the order of 10s of seconds for a new frame.
>
>
That sounds about right; my guess is that this is the kdtree building,
which will get the vertex centered data.

>
>    -
>    - Generating ghost zones is very fast for datasets with only one grid.
>    It is incredibly slow for datasets with multiple grids, dominating the run
>    time.
>
>
Sounds about right.

>
>    - I attach a plot comparing the three tests.
>
> Naively, it seems to me that there must be several sources of overhead
> when I perform volume rendering that are significantly more costly than
> simply reading in the data (which seems to be fast). Clearly one of these
> is the generation of the ghost zones, and another is the actual ray tracing
> (although the ray tracing itself seems to be quite fast.). However I'm not
> sure that these two operations alone explain the cost of the volume
> rendering.
>

I think that the cost is likely dominated by the vertex-centering.  It may
be possible to overload get_vertex_centered_data for your subclass of
AMRPatchGrid to make this faster based on your data format.  I am not sure
that it is a huge cost to do the ray casting itself; there are likely
python operations like writing the image, etc, that are a non-negligible
fraction of that time.

> I'd love to know other people's experience with benchmarking. Do these
> costs seem normal, up to an order of magnitude or so? Would you have any
> insight into what contributes to the cost of generating the first frame
> when volume rendering?
>
> Thanks very much!
>
> Best,
> Jonah Miller
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20160506/d8b7820a/attachment.html>