[yt-users] Projection Performance

Wed May 2 16:07:30 PDT 2012

Hi Rick,

On Wed, May 2, 2012 at 6:51 PM, Richard P Wagner <rpwagner at sdsc.edu> wrote:
> Hi Matt,
>
> On May 2, 2012, at 5:05 AM, Matthew Turk wrote:
>
>> Hi Rick,
>>
>> On Tue, May 1, 2012 at 8:15 PM, Richard P Wagner <rpwagner at sdsc.edu> wrote:
>>> Hi,
>>>
>>> I wanted to build a sequence of projections using various color maps along each axis. The data set I'm using the z = 0 one from the L7 simulation some of you are familiar with. Here's the paste of my current script:
>>>  http://paste.yt-project.org/show/2335/
>>>
>>> (The early sys.exit is deliberate.)
>>>
>>> Does anyone have an estimate of the time doing this projection serially should take? After two hours it was still going without having produced the first image. The same plots done on the z = 2.75 data took about 15 minutes (this data has about 1/5 the number of grids, though).
>>
>> This should definitely not take two hours.
>>
>>>
>>> I will also gladly take advice on better methods for the projections, or if the benefit to doing this in parallel is worth it.
>>
>> As Sam noted, the quadtree-based projection carries with it an
>> expensive reduction step.  We have a new algorithm for the reduction
>> which should reduce the time taken (and we have been using it on the
>> inline runs on Blue Waters, where it's been quite successful).
>> However, the quadtree projection in serial is still likely to be the
>> optimal method, unless the memory on your machine is less than the
>> number of fields * the max size of a given level.  (In particular,
>> quad proj in parallel will correctly load-balance subregions, like you
>> use, but overlap_proj will not.)
>>
>> There are a couple things to investigate.
>>
>> 0) Can you try running in serial?
>> 1) Where does it spend its time?
>> 1a) Where does the code report that it is in the process?  In reading
>> data?  Is it constantly putting something out?  Is there a level it
>> seems to be stuck at?
>> 1b) Have you tried using parallel_profile (which may only be in the
>> development tip, although we're so close to a release I recommend
>> running on that) to examine the overall time spent?
>> 1c) If you send the apparently hung process SIGUSR1, where does the
>> stack trace report it is?
>> 2) Can you try a simple projection of Ones?  I don't have L7 z=0 on my
>> machine here, but I do have z=0.1 (RD0035) and this script:
>>
>> http://paste.yt-project.org/show/2336/
>>
>> reports a time of 235 seconds, with a maximum memory usage of 2.5 gb.
>> 3) As a silly question, does this data by any chance have the modified
>> hierarchy?  yt is fragile for Enzo hierarchy formats -- like Enzo! --
>> and it may be stuck in an unhaltable parse operation.
>
> Last night I ran on 64 cores using overlap_proj, and it was much faster, on the order of a few minutes per field. There were some hiccups doing multiple fields, but I'm willing to blame that on my script, rather than a bug.

Interesting.  I'm still very curious why the quad tree is hanging.
Today I implemented the tree-extension code for the quad tree, so it
should now scale in parallel.  (This reduces the operations to a
single reduction, rather than pairwise collapse of the data
structure.)  Once it's been vetted it will go into development, in
anticipation of the 2.4 release coming up.

>
>> As for your script, there are a couple places you could definitely
>> speed it up.  Since you're saving out absolutely enormous figures,
>> you're calling on matplotlib to make humongous things -- this isn't
>> where you're stuck, but I'd recommend skipping the entire plot
>> collection interface and manually creating both the images and the
>> projections.  This lets you conduct projections of multiple fields in
>> a single pass, as long as they have the same "weight" field:
>>
>> field_names = list(sorted(fields.keys()))
>> for ax in range(3):
>>    proj = pf.h.proj(0, field_names)
>>    frb = proj.to_frb( ... )
>
> Thanks, this is exactly what I needed for the next step of saving the projections to an HDF5 file. Reusing the pattern for the images is great. Regarding the size, I think two sizes is a good idea: a "smaller" one with a color bar, and a "larger" one without a border.

Awesome.  Any call to the projection object's initializer will require
a full walk of the hierarchy, so if you can batch the fields you'll
save tons of time.

If I may, I'd recommend using the native projection format for saving
in HDF5; I actually did a comparison of the light cone, and a single
image at full resolution is substantially smaller in the variable
resolution format yt uses than in a full-res, 2D buffer.  Section 3.2
in our method paper describes both the format and how that format is
translated into an image.  The format is pretty simple -- positions,
half-widths and values -- so it's not subject to bit-rot like normal
serialization of Python objects.  These are located in the .yt file
that yt creates, but you can also force them with the node_name
argument Britton mentioned.  The values are also held in the
projection, and can be accessed like a dict:

proj["px"]
proj["py"]
proj["pdx"]
proj["pdy"]
proj["Density"]
...

>
>> This will project in a single pass / walk of the hierarchy, and it
>> will also batch the IO -- the fields will all get read in a single
>> call to H5Fopen.  If you don't want to use FRBs manually, this will
>> still save them so the next call to add_projection will grab them from
>> the file.  If you're doing sub-regions there's probably a way to pass
>> node_name through to save them, or you can use FRBs and manually make
>> the matplotlib image.  FRBs can be written out to disk with
>> write_image, and I believe Cameron or Stella have some code that can
>> even stick annotated colorbars on them.
>>
>> Anyway, I don't think projections should be so slow -- and I'd be
>> interested to hear where exactly it's getting hung up.
>
> I should be able to set up three scripts: the one I want; the one that crashes; and the one that's slow. I'll post what I find, and we can go from there.

Great!  As a quick note, I think I alluded to the parallel_profile
machinery, but you can invoke it like:

with parallel_profile("some_prefix"):
    some_expensive_operation()

It'll save out .cprof files with the processor number appended for
everything inside the context manager.  These would definitely help us
track down where the quad tree is getting stuck.  My fear is that it's
in the reduction operation, and that the turnover point was where you
reached that number of grids.  (In serial at that point it would
probably be faster!)  The new method for quadtree stuff (in my yt
fork, http://bitbucket.org/MatthewTurk/yt/ ) should not be subject to
that issue.

Also, not sure if I mentioned it in the other email, but this curation
project is pretty cool.  Nice work!

-Matt

>
> --Rick
>
>>
>> -Matt
>>
>>
>>>
>>> Thanks,
>>> Rick
>>>
>>> _______________________________________________
>>> yt-users mailing list
>>> yt-users at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>> _______________________________________________
>> yt-users mailing list
>> yt-users at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
>
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org