[yt-users] Projection Performance

Wed May 2 05:05:54 PDT 2012

Hi Rick,

On Tue, May 1, 2012 at 8:15 PM, Richard P Wagner <rpwagner at sdsc.edu> wrote:
> Hi,
>
> I wanted to build a sequence of projections using various color maps along each axis. The data set I'm using the z = 0 one from the L7 simulation some of you are familiar with. Here's the paste of my current script:
>  http://paste.yt-project.org/show/2335/
>
> (The early sys.exit is deliberate.)
>
> Does anyone have an estimate of the time doing this projection serially should take? After two hours it was still going without having produced the first image. The same plots done on the z = 2.75 data took about 15 minutes (this data has about 1/5 the number of grids, though).

This should definitely not take two hours.

>
> I will also gladly take advice on better methods for the projections, or if the benefit to doing this in parallel is worth it.

As Sam noted, the quadtree-based projection carries with it an
expensive reduction step.  We have a new algorithm for the reduction
which should reduce the time taken (and we have been using it on the
inline runs on Blue Waters, where it's been quite successful).
However, the quadtree projection in serial is still likely to be the
optimal method, unless the memory on your machine is less than the
number of fields * the max size of a given level.  (In particular,
quad proj in parallel will correctly load-balance subregions, like you
use, but overlap_proj will not.)

There are a couple things to investigate.

0) Can you try running in serial?
1) Where does it spend its time?
 1a) Where does the code report that it is in the process?  In reading
data?  Is it constantly putting something out?  Is there a level it
seems to be stuck at?
 1b) Have you tried using parallel_profile (which may only be in the
development tip, although we're so close to a release I recommend
running on that) to examine the overall time spent?
 1c) If you send the apparently hung process SIGUSR1, where does the
stack trace report it is?
2) Can you try a simple projection of Ones?  I don't have L7 z=0 on my
machine here, but I do have z=0.1 (RD0035) and this script:

http://paste.yt-project.org/show/2336/

reports a time of 235 seconds, with a maximum memory usage of 2.5 gb.
3) As a silly question, does this data by any chance have the modified
hierarchy?  yt is fragile for Enzo hierarchy formats -- like Enzo! --
and it may be stuck in an unhaltable parse operation.

As for your script, there are a couple places you could definitely
speed it up.  Since you're saving out absolutely enormous figures,
you're calling on matplotlib to make humongous things -- this isn't
where you're stuck, but I'd recommend skipping the entire plot
collection interface and manually creating both the images and the
projections.  This lets you conduct projections of multiple fields in
a single pass, as long as they have the same "weight" field:

field_names = list(sorted(fields.keys()))
for ax in range(3):
    proj = pf.h.proj(0, field_names)
    frb = proj.to_frb( ... )

This will project in a single pass / walk of the hierarchy, and it
will also batch the IO -- the fields will all get read in a single
call to H5Fopen.  If you don't want to use FRBs manually, this will
still save them so the next call to add_projection will grab them from
the file.  If you're doing sub-regions there's probably a way to pass
node_name through to save them, or you can use FRBs and manually make
the matplotlib image.  FRBs can be written out to disk with
write_image, and I believe Cameron or Stella have some code that can
even stick annotated colorbars on them.

Anyway, I don't think projections should be so slow -- and I'd be
interested to hear where exactly it's getting hung up.

-Matt

>
> Thanks,
> Rick
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org