[yt-users] Projection Performance

Richard P Wagner rpwagner at sdsc.edu
Wed May 2 15:51:48 PDT 2012


Hi Matt,

On May 2, 2012, at 5:05 AM, Matthew Turk wrote:

> Hi Rick,
> 
> On Tue, May 1, 2012 at 8:15 PM, Richard P Wagner <rpwagner at sdsc.edu> wrote:
>> Hi,
>> 
>> I wanted to build a sequence of projections using various color maps along each axis. The data set I'm using the z = 0 one from the L7 simulation some of you are familiar with. Here's the paste of my current script:
>>  http://paste.yt-project.org/show/2335/
>> 
>> (The early sys.exit is deliberate.)
>> 
>> Does anyone have an estimate of the time doing this projection serially should take? After two hours it was still going without having produced the first image. The same plots done on the z = 2.75 data took about 15 minutes (this data has about 1/5 the number of grids, though).
> 
> This should definitely not take two hours.
> 
>> 
>> I will also gladly take advice on better methods for the projections, or if the benefit to doing this in parallel is worth it.
> 
> As Sam noted, the quadtree-based projection carries with it an
> expensive reduction step.  We have a new algorithm for the reduction
> which should reduce the time taken (and we have been using it on the
> inline runs on Blue Waters, where it's been quite successful).
> However, the quadtree projection in serial is still likely to be the
> optimal method, unless the memory on your machine is less than the
> number of fields * the max size of a given level.  (In particular,
> quad proj in parallel will correctly load-balance subregions, like you
> use, but overlap_proj will not.)
> 
> There are a couple things to investigate.
> 
> 0) Can you try running in serial?
> 1) Where does it spend its time?
> 1a) Where does the code report that it is in the process?  In reading
> data?  Is it constantly putting something out?  Is there a level it
> seems to be stuck at?
> 1b) Have you tried using parallel_profile (which may only be in the
> development tip, although we're so close to a release I recommend
> running on that) to examine the overall time spent?
> 1c) If you send the apparently hung process SIGUSR1, where does the
> stack trace report it is?
> 2) Can you try a simple projection of Ones?  I don't have L7 z=0 on my
> machine here, but I do have z=0.1 (RD0035) and this script:
> 
> http://paste.yt-project.org/show/2336/
> 
> reports a time of 235 seconds, with a maximum memory usage of 2.5 gb.
> 3) As a silly question, does this data by any chance have the modified
> hierarchy?  yt is fragile for Enzo hierarchy formats -- like Enzo! --
> and it may be stuck in an unhaltable parse operation.

Last night I ran on 64 cores using overlap_proj, and it was much faster, on the order of a few minutes per field. There were some hiccups doing multiple fields, but I'm willing to blame that on my script, rather than a bug.

> As for your script, there are a couple places you could definitely
> speed it up.  Since you're saving out absolutely enormous figures,
> you're calling on matplotlib to make humongous things -- this isn't
> where you're stuck, but I'd recommend skipping the entire plot
> collection interface and manually creating both the images and the
> projections.  This lets you conduct projections of multiple fields in
> a single pass, as long as they have the same "weight" field:
> 
> field_names = list(sorted(fields.keys()))
> for ax in range(3):
>    proj = pf.h.proj(0, field_names)
>    frb = proj.to_frb( ... )

Thanks, this is exactly what I needed for the next step of saving the projections to an HDF5 file. Reusing the pattern for the images is great. Regarding the size, I think two sizes is a good idea: a "smaller" one with a color bar, and a "larger" one without a border.

> This will project in a single pass / walk of the hierarchy, and it
> will also batch the IO -- the fields will all get read in a single
> call to H5Fopen.  If you don't want to use FRBs manually, this will
> still save them so the next call to add_projection will grab them from
> the file.  If you're doing sub-regions there's probably a way to pass
> node_name through to save them, or you can use FRBs and manually make
> the matplotlib image.  FRBs can be written out to disk with
> write_image, and I believe Cameron or Stella have some code that can
> even stick annotated colorbars on them.
> 
> Anyway, I don't think projections should be so slow -- and I'd be
> interested to hear where exactly it's getting hung up.

I should be able to set up three scripts: the one I want; the one that crashes; and the one that's slow. I'll post what I find, and we can go from there.

--Rick

> 
> -Matt
> 
> 
>> 
>> Thanks,
>> Rick
>> 
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org




More information about the yt-users mailing list