<div dir="ltr">Hi Jonah,<br><div><br></div><div>Thanks for supplying these.  I've reviewed them but am still processing.</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 3, 2016 at 10:48 PM, Jonah Miller <span dir="ltr"><<a href="mailto:jonah.maxwell.miller@gmail.com" target="_blank">jonah.maxwell.miller@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div text="#000000" bgcolor="#FFFFFF">

    <div lang="x-unicode"> Hi yt-dev,<br>

      <br>

      I am developing a frontend for Einstein Toolkit for yt and in the

      process I generated some crude, preliminary benchmarks which I

      thought I would share in case anybody is interested.<br>

      <br>

      I performed three tests:<br>

      <ol>

        <li>I just load the dataset and calculate (say) the maximum of

          some quantity on each grid.<br>

        </li>

        <li>I load in the dataset and calculate the maximum of the

          magnitude of a gradient on each grid. This requires the

          generation of ghost zones at grid boundaries. <br>

        </li>

        <li>I load in the dataset and perform a volume rendering. I make

          a "movie" with 4 frames where I rotate around the volume.</li>

      </ol>

      <p>Salient details:<br>

      </p>

      <ul>

        <li>I performed these tests with the attached scripts. I turned

          openmp and MPI off (set OMP_NUM_THREADS=1). <br>

        </li>

        <li>The domain sizes were: 64^3, 128^3, 256^3, and 512^3. <br>

        </li>

        <li>All four datasets have only one refinement level. However

          the 512^3 dataset has multiple grids on that refinement level.</li>

        <li>The size of a dataset ranges from about 10MB for the 64^3

          set to 10GB for the 512^3</li>

        <li>All data was performed on a single modern workstation.

          Details: 2.6GHz clock with AVX2 instructions and a 60MB L3

          cache.<br>

        </li>

      </ul>

      <p>Results:<br>

      </p>

      <ul>

        <li>Reading in the data is (for the datasets I tested) extremely

          fast. The 512^3 dataset takes only about 20 seconds to read

          the whole thing in.</li></ul></div></div></blockquote><div><br></div><div>That's good news!</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><div lang="x-unicode"><ul>

        <li>Generating the first frame in a volume render is extremely

          slow. On the order of 4 minutes for the 512^3 data set. After

          the first frame is produced, new frames are fast, even with

          openmp off. With 1 openmp thread, it takes on the order of 10s

          of seconds for a new frame.<br></li></ul></div></div></blockquote><div><br></div><div>That sounds about right; my guess is that this is the kdtree building, which will get the vertex centered data.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><div lang="x-unicode"><ul><li>

        </li>

        <li>Generating ghost zones is very fast for datasets with only

          one grid. It is incredibly slow for datasets with multiple

          grids, dominating the run time.</li></ul></div></div></blockquote><div><br></div><div>Sounds about right.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><div lang="x-unicode"><ul>

        <li>I attach a plot comparing the three tests.<br>

        </li>

      </ul>

      <p>Naively, it seems to me that there must be several sources of

        overhead when I perform volume rendering that are significantly

        more costly than simply reading in the data (which seems to be

        fast). Clearly one of these is the generation of the ghost

        zones, and another is the actual ray tracing (although the ray

        tracing itself seems to be quite fast.). However I'm not sure

        that these two operations alone explain the cost of the volume

        rendering.<br></p></div></div></blockquote><div><br></div><div>I think that the cost is likely dominated by the vertex-centering.  It may be possible to overload get_vertex_centered_data for your subclass of AMRPatchGrid to make this faster based on your data format.  I am not sure that it is a huge cost to do the ray casting itself; there are likely python operations like writing the image, etc, that are a non-negligible fraction of that time.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><div lang="x-unicode"><p>

      </p>

      <p>I'd love to know other people's experience with benchmarking.

        Do these costs seem normal, up to an order of magnitude or so?

        Would you have any insight into what contributes to the cost of

        generating the first frame when volume rendering?<br>

      </p>

      <p>Thanks very much!<br>

      </p>

      <p>Best,<br>

        Jonah Miller<br>

      </p>

    </div>

  </div>

<br>_______________________________________________<br>

yt-dev mailing list<br>

<a href="mailto:yt-dev@lists.spacepope.org">yt-dev@lists.spacepope.org</a><br>

<a href="http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org" rel="noreferrer" target="_blank">http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org</a><br>

<br></blockquote></div><br></div></div>