[Yt-dev] Projection speed improvement patch

Tue Nov 3 19:07:20 PST 2009

Wow, John -- that's simply unacceptable performance on yt's part, both
from a memory and a timing standpoint.

I'd love to take a look at this data, so if you want to toss it my
way, please do so!

On Tue, Nov 3, 2009 at 7:02 PM, John Wise <jwise at astro.princeton.edu> wrote:
> Hi Matt,
>
> That is great news!  About two months ago, I tried doing full projections on
> a 768^3 AMR everywhere (10 levels) on ranger.  But I've had problems running
> out of memory (never really bothered to check memory usage because you can't
> have interactive jobs ... to my knowledge).  I was running with 256 cores
> (512GB RAM should be enough...).  The I/O was taking forever, also.  I ended
> up just doing projections of subvolumes.
>
> But I'll be sure to test your improved version (along with the .harrays
> file) and report back to the list!
>
> Thanks!
> John
>
>
> On 3 Nov 2009, at 15:43, Matthew Turk wrote:
>
>> Hi guys,
>>
>> I just wanted to report one more benchmark which might be interesting
>> for a couple of you.  I ran the same test, with *one additional
>> projected field* on Triton, and it takes 3:45 to project the entire
>> 512^3 L7 amr-everywhere dataset to the finest resolution, projecting
>> Density, Temperature, VelocityMagnitude (requires 3 fields) and
>> Gravitational_Potential.  This is over ethernet, rather than shared
>> memory (I did not use the myrinet interconnect for this test) and it's
>> with an additional field -- so pretty good, I think.
>>
>> There are some issues with processors lagging, but I think they are
>> not a big deal anymore!
>>
>> -Matt
>>
>> On Mon, Nov 2, 2009 at 10:25 PM, Matthew Turk <matthewturk at gmail.com>
>> wrote:
>>>
>>> Hi Sam,
>>>
>>> I guess you're right, without the info about the machine, this doesn't
>>> help much!
>>>
>>> This was running on a new machine at SLAC called 'orange-bigmem' --
>>> it's a 32node machine with a ton of memory available to all the
>>> processors.  I checked memory usage at the end of the run, and after
>>> the projection ahd been save out a few times it was around 1.5 gigs
>>> per node.  I'm threading some outputs of the total memory usage
>>> through the projection code, and hopefully that will give us an idea
>>> of the peak memory usage.
>>>
>>> The file system is lustre, which works well with the preloading of the
>>> data, and I ran it a couple times beforehand to make sure that the
>>> files were in local cache or whatever.
>>>
>>> So the communication was via shared memory, which while still an MPI
>>> interface is much closer to ideal.  I will be giving it a go on a
>>> cluster tomorrow, after I work out some kinks with data storage.  I've
>>> moved the generation of the binary hierarchies into yt -- so if you
>>> don't have one, rather than dumping the hierarchy into the .yt file,
>>> it will dump it into the .harrays file.  This way if anyone else
>>> writes an interface for the binary hierarchy method, we can all share
>>> it.  (I think it would be a bad idea to have Enzo output a .yt file.
>>> ;-)  The .yt file will now exist solely to store objects, not any of
>>> the hierarchy info.
>>>
>>> -Matt
>>>
>>> On Mon, Nov 2, 2009 at 10:13 PM, Sam Skillman <72Nova at gmail.com> wrote:
>>>>
>>>> Hi Matt,
>>>> This is awesome.  I don't think anyone can expect much faster for that
>>>> dataset.  I remember running projections just a year or so ago on this
>>>> data
>>>> and it taking a whole lot more time (just reading in the data took
>>>> ages).
>>>>  What machine were you able to do this on?  I'm mostly curious about the
>>>> memory it used, or had available to it.
>>>> In any case, I'd say this is a pretty big success, and the binary
>>>> hierarchies are a great idea.
>>>> Cheers,
>>>> Sam
>>>> On Mon, Nov 2, 2009 at 8:47 PM, Matthew Turk <matthewturk at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi guys,
>>>>>
>>>>> (For all of these performance indicators, I've used the 512^3 L7
>>>>> amr-everywhere run called the "LightCone."  This particular dataset
>>>>> has ~380,000 grids and is a great place to find the )
>>>>>
>>>>> Last weekend I did a little bit of benchmarking and saw that the
>>>>> parallel projections (and likely several other parallel operations)
>>>>> all sat inside an MPI_Barrier for far too long.  I converted (I
>>>>> think!) this process to be an MPI_Alltoallv operation, following on an
>>>>> MPI_Allreduce to get the final array size and the offsets into an
>>>>> ordered array, and I think it is working.  I saw pretty good
>>>>> performance improvements, but it's tough to quantify those right now
>>>>> -- for projecting "Ones" (no disk-access) it sped things up by ~15%.
>>>>>
>>>>> I've also added a new binary hierarchy method to devel enzo, and it
>>>>> provides everything that is necessary for yt to analyze the data.  As
>>>>> such, if a %(basename)s.harrays file exists, it will be used, and yt
>>>>> will not need to open the .hierarchy file at all.  This sped things up
>>>>> by 100 seconds.  I've written a script to create these
>>>>> (http://www.slac.stanford.edu/~mturk/create_harrays.py), but
>>>>> outputting them inline in Enzo is the fastest.
>>>>>
>>>>> To top this all off, I ran a projection -- start to finish, including
>>>>> all overhead -- on 16 processors.  To project the fields "Density"
>>>>> (native), "Temperature" (native) and "VelocityMagnitude" (derived,
>>>>> requires x-, y- and z-velocity) on 16 processors to the finest
>>>>> resolution (adaptive projection -- to L7) takes 140 seconds, or
>>>>> roughly 2:20.
>>>>>
>>>>> I've looked at the profiling outputs, and it seems to me that there
>>>>> are still some places performance could be squeezed out.  That being
>>>>> said, I'm pretty pleased with these results.
>>>>>
>>>>> These are all in the named branch hierarchy-opt in mercurial.  They
>>>>> rely on some rearrangement of the hierarchy parsing and whatnot that
>>>>> has lived in hg for a little while; it will go into the trunk as soon
>>>>> as I get the all clear about moving to a proper stable/less-stable dev
>>>>> environment.  I also have some other test suites to run on them, and I
>>>>> want to make sure the memory usage is not excessive.
>>>>>
>>>>> Best,
>>>>>
>>>>> Matt
>>>>> _______________________________________________
>>>>> Yt-dev mailing list
>>>>> Yt-dev at lists.spacepope.org
>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>>
>>>>
>>>>
>>>> --
>>>> Samuel W. Skillman
>>>> DOE Computational Science Graduate Fellow
>>>> Center for Astrophysics and Space Astronomy
>>>> University of Colorado at Boulder
>>>> samuel.skillman[at]colorado.edu
>>>>
>>>> _______________________________________________
>>>> Yt-dev mailing list
>>>> Yt-dev at lists.spacepope.org
>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>>
>>>>
>>>
>> _______________________________________________
>> Yt-dev mailing list
>> Yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>