[Yt-dev] Projection speed improvement patch
matthewturk at gmail.com
Tue Nov 3 11:43:45 PST 2009
I just wanted to report one more benchmark which might be interesting
for a couple of you. I ran the same test, with *one additional
projected field* on Triton, and it takes 3:45 to project the entire
512^3 L7 amr-everywhere dataset to the finest resolution, projecting
Density, Temperature, VelocityMagnitude (requires 3 fields) and
Gravitational_Potential. This is over ethernet, rather than shared
memory (I did not use the myrinet interconnect for this test) and it's
with an additional field -- so pretty good, I think.
There are some issues with processors lagging, but I think they are
not a big deal anymore!
On Mon, Nov 2, 2009 at 10:25 PM, Matthew Turk <matthewturk at gmail.com> wrote:
> Hi Sam,
> I guess you're right, without the info about the machine, this doesn't
> help much!
> This was running on a new machine at SLAC called 'orange-bigmem' --
> it's a 32node machine with a ton of memory available to all the
> processors. I checked memory usage at the end of the run, and after
> the projection ahd been save out a few times it was around 1.5 gigs
> per node. I'm threading some outputs of the total memory usage
> through the projection code, and hopefully that will give us an idea
> of the peak memory usage.
> The file system is lustre, which works well with the preloading of the
> data, and I ran it a couple times beforehand to make sure that the
> files were in local cache or whatever.
> So the communication was via shared memory, which while still an MPI
> interface is much closer to ideal. I will be giving it a go on a
> cluster tomorrow, after I work out some kinks with data storage. I've
> moved the generation of the binary hierarchies into yt -- so if you
> don't have one, rather than dumping the hierarchy into the .yt file,
> it will dump it into the .harrays file. This way if anyone else
> writes an interface for the binary hierarchy method, we can all share
> it. (I think it would be a bad idea to have Enzo output a .yt file.
> ;-) The .yt file will now exist solely to store objects, not any of
> the hierarchy info.
> On Mon, Nov 2, 2009 at 10:13 PM, Sam Skillman <72Nova at gmail.com> wrote:
>> Hi Matt,
>> This is awesome. I don't think anyone can expect much faster for that
>> dataset. I remember running projections just a year or so ago on this data
>> and it taking a whole lot more time (just reading in the data took ages).
>> What machine were you able to do this on? I'm mostly curious about the
>> memory it used, or had available to it.
>> In any case, I'd say this is a pretty big success, and the binary
>> hierarchies are a great idea.
>> On Mon, Nov 2, 2009 at 8:47 PM, Matthew Turk <matthewturk at gmail.com> wrote:
>>> Hi guys,
>>> (For all of these performance indicators, I've used the 512^3 L7
>>> amr-everywhere run called the "LightCone." This particular dataset
>>> has ~380,000 grids and is a great place to find the )
>>> Last weekend I did a little bit of benchmarking and saw that the
>>> parallel projections (and likely several other parallel operations)
>>> all sat inside an MPI_Barrier for far too long. I converted (I
>>> think!) this process to be an MPI_Alltoallv operation, following on an
>>> MPI_Allreduce to get the final array size and the offsets into an
>>> ordered array, and I think it is working. I saw pretty good
>>> performance improvements, but it's tough to quantify those right now
>>> -- for projecting "Ones" (no disk-access) it sped things up by ~15%.
>>> I've also added a new binary hierarchy method to devel enzo, and it
>>> provides everything that is necessary for yt to analyze the data. As
>>> such, if a %(basename)s.harrays file exists, it will be used, and yt
>>> will not need to open the .hierarchy file at all. This sped things up
>>> by 100 seconds. I've written a script to create these
>>> (http://www.slac.stanford.edu/~mturk/create_harrays.py), but
>>> outputting them inline in Enzo is the fastest.
>>> To top this all off, I ran a projection -- start to finish, including
>>> all overhead -- on 16 processors. To project the fields "Density"
>>> (native), "Temperature" (native) and "VelocityMagnitude" (derived,
>>> requires x-, y- and z-velocity) on 16 processors to the finest
>>> resolution (adaptive projection -- to L7) takes 140 seconds, or
>>> roughly 2:20.
>>> I've looked at the profiling outputs, and it seems to me that there
>>> are still some places performance could be squeezed out. That being
>>> said, I'm pretty pleased with these results.
>>> These are all in the named branch hierarchy-opt in mercurial. They
>>> rely on some rearrangement of the hierarchy parsing and whatnot that
>>> has lived in hg for a little while; it will go into the trunk as soon
>>> as I get the all clear about moving to a proper stable/less-stable dev
>>> environment. I also have some other test suites to run on them, and I
>>> want to make sure the memory usage is not excessive.
>>> Yt-dev mailing list
>>> Yt-dev at lists.spacepope.org
>> Samuel W. Skillman
>> DOE Computational Science Graduate Fellow
>> Center for Astrophysics and Space Astronomy
>> University of Colorado at Boulder
>> Yt-dev mailing list
>> Yt-dev at lists.spacepope.org
More information about the yt-dev