[Yt-dev] Projection speed improvement patch

Mon Nov 2 22:13:27 PST 2009

Hi Matt,

This is awesome.  I don't think anyone can expect much faster for that
dataset.  I remember running projections just a year or so ago on this data
and it taking a whole lot more time (just reading in the data took ages).
 What machine were you able to do this on?  I'm mostly curious about the
memory it used, or had available to it.

In any case, I'd say this is a pretty big success, and the binary
hierarchies are a great idea.

Cheers,
Sam

On Mon, Nov 2, 2009 at 8:47 PM, Matthew Turk <matthewturk at gmail.com> wrote:

> Hi guys,
>
> (For all of these performance indicators, I've used the 512^3 L7
> amr-everywhere run called the "LightCone."  This particular dataset
> has ~380,000 grids and is a great place to find the )
>
> Last weekend I did a little bit of benchmarking and saw that the
> parallel projections (and likely several other parallel operations)
> all sat inside an MPI_Barrier for far too long.  I converted (I
> think!) this process to be an MPI_Alltoallv operation, following on an
> MPI_Allreduce to get the final array size and the offsets into an
> ordered array, and I think it is working.  I saw pretty good
> performance improvements, but it's tough to quantify those right now
> -- for projecting "Ones" (no disk-access) it sped things up by ~15%.
>
> I've also added a new binary hierarchy method to devel enzo, and it
> provides everything that is necessary for yt to analyze the data.  As
> such, if a %(basename)s.harrays file exists, it will be used, and yt
> will not need to open the .hierarchy file at all.  This sped things up
> by 100 seconds.  I've written a script to create these
> (http://www.slac.stanford.edu/~mturk/create_harrays.py), but
> outputting them inline in Enzo is the fastest.
>
> To top this all off, I ran a projection -- start to finish, including
> all overhead -- on 16 processors.  To project the fields "Density"
> (native), "Temperature" (native) and "VelocityMagnitude" (derived,
> requires x-, y- and z-velocity) on 16 processors to the finest
> resolution (adaptive projection -- to L7) takes 140 seconds, or
> roughly 2:20.
>
> I've looked at the profiling outputs, and it seems to me that there
> are still some places performance could be squeezed out.  That being
> said, I'm pretty pleased with these results.
>
> These are all in the named branch hierarchy-opt in mercurial.  They
> rely on some rearrangement of the hierarchy parsing and whatnot that
> has lived in hg for a little while; it will go into the trunk as soon
> as I get the all clear about moving to a proper stable/less-stable dev
> environment.  I also have some other test suites to run on them, and I
> want to make sure the memory usage is not excessive.
>
> Best,
>
> Matt
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>

-- 
Samuel W. Skillman
DOE Computational Science Graduate Fellow
Center for Astrophysics and Space Astronomy
University of Colorado at Boulder
samuel.skillman[at]colorado.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20091102/bfd57b79/attachment.html>