[yt-dev] Parallelization & the ART frontend

Matthew Turk matthewturk at gmail.com
Thu Feb 9 07:50:08 PST 2012


Hi Chris,

On Wed, Feb 8, 2012 at 5:20 PM, Christopher Moody <cemoody at ucsc.edu> wrote:
> Hi Matt,
> Sounds good! See ya at 1pm EST.
>
> I've played around with lots of gridding mechanism now, with lots of
> fiddling of parameters. I now don't think growing octs along a sparse
> Hilbert curve (what I've been playing around with) is particularly more
> efficient than splitting clumps on a coarse curve (default). It's also tough
> to diagnose how to speed stuff up; an efficient hierarchy with many small
> grids (+800 grids on a level) is easy on memory but takes x100 longer to
> project. It's hard to guess how stuff scales with ncells, # of grids on a
> level, and memory efficiency all being independent variables.  In the end,
> all I've done is made a few very, very, small changes to some of the grid
> patch recursive splitting code, which has had a dramatic effect (~5x
> speedup) for my simulations, but I don't think will be too helpful outside
> of that.

The 100x performance hit you're seeing is, I think, 100% dependent on
the IO.  I bet if you ran with cProfile you'd see almost all of the
time is spent in reading data from disk, which is largely redundant in
the ART frontend if I remember correctly, and then throwing that away.

There are several steps to improve this.  The most long-reaching is to
rethink the entire way we do geometric selection.  We could then
construct extremely coarse bounding boxes and then do a progressive
step over data to identify which is available where.  This is
something I have been working on recently; it's quite a ways from
usability (few months), but preliminary tests even with patch-based
AMR suggest a speedup in many geometry-selection routines of between
2-5x.  I suspect this number will come down as it becomes more
fully-featured, but it should be a gigantic improvement for oct-based
codes.  The trouble with those codes, like RAMSES and ART, is keeping
a minimum set of information in memory, which will allow data loading
from disk.  This is why in the past we've used patches, but with the
new geometry stuff we should be able to handle the full set of octs
(until data gets extremely large, but this should hold us off until we
can write a distributed geometry.)

The closer step is simply to divvy up the IO in advanced by assigning
some kind of loadbalancing ID to subsets of the hierarchy of
octs/patches.  This is what we do with RAMSES, although I should note
that RAMSES (unlike the cevART data format, IIRC) is actually split up
into multiple files.  This splitting is used to partition regridded
octs.  If we do this, then we can, for instance, load balance
level-by-level rather than grid-by-grid, which will help with the
speed problems you're seeing.

Anyway, see you at 1.

-Matt

>
> chris
>
> On Wed, Feb 8, 2012 at 1:47 PM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>> Hi Chris,
>>
>> This does help.  My suspicion is that the load balancing is giving all
>> the lower-level grids to one processor, and all the upper level grids
>> to another.  If you have time tomorrow, let's work through this
>> together on IRC.  I'm free 1-2PM EST, 3-5PM EST.  I think it should be
>> a pretty straightforward fix.
>>
>> It would also be cool to see your new method of accumulating octs into
>> patches.
>>
>> -Matt
>>
>> On Wed, Feb 8, 2012 at 4:09 PM, Christopher Moody <cemoody at ucsc.edu>
>> wrote:
>> > Hi Matt,
>> >
>> > pf.h.proj is of type <class 'yt.data_objects.hierarchy.AMRQuadTreeProj'>
>> > and
>> > refine_by is 2.
>> >
>> > Does this help? I'm not sure what you mean by overlaps -  doesn't the
>> > the
>> > RAMSES grid patching mechanism produce non-overlapping grids from the
>> > octs?
>> > Is quadtree proj checking for overlapping grids?
>> >
>> > chris
>> >
>> > On Wed, Feb 8, 2012 at 12:44 PM, Matthew Turk <matthewturk at gmail.com>
>> > wrote:
>> >>
>> >> Hi Chris,
>> >>
>> >> Yeah, that's weird.  My guess is that load balancing is going haywire
>> >> for some reason, likely due to overlap versus quadtree proj.  Can you
>> >> tell me what type of object pf.h.proj is?  i.e., what's the output of
>> >> "print pf.h.proj"?  And then, what's pf.refine_by?
>> >>
>> >> -Matt
>> >>
>> >> On Wed, Feb 8, 2012 at 3:00 PM, Christopher Moody <cemoody at ucsc.edu>
>> >> wrote:
>> >> > Hi Matt,
>> >> >
>> >> > I've got the log output here:http://paste.yt-project.org/show/2153/
>> >> > with
>> >> > the
>> >> > serial version here http://paste.yt-project.org/show/2154/ .
>> >> >
>> >> > The most interesting tidbit is below, where it looks like core 0
>> >> > projects
>> >> > Levels 0-5 and core 1 projects Level 6 (which takes up like 99% of
>> >> > the
>> >> > projection time.)
>> >> >
>> >> > chris
>> >> >
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,403 Going to obtain []
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,406 Preloading ['density']
>> >> > from 0
>> >> > grids
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,406 End of projecting level
>> >> > level
>> >> > 0, memory usage 3.545e-01
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,406 Preloading ['density']
>> >> > from 0
>> >> > grids
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,406 End of projecting level
>> >> > level
>> >> > 1, memory usage 3.545e-01
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,407 Preloading ['density']
>> >> > from 0
>> >> > grids
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,407 End of projecting level
>> >> > level
>> >> > 2, memory usage 3.545e-01
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,408 Preloading ['density']
>> >> > from 0
>> >> > grids
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,408 End of projecting level
>> >> > level
>> >> > 3, memory usage 3.545e-01
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,408 Preloading ['density']
>> >> > from 0
>> >> > grids
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,408 End of projecting level
>> >> > level
>> >> > 4, memory usage 3.545e-01
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,409 Preloading ['density']
>> >> > from 0
>> >> > grids
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,409 End of projecting level
>> >> > level
>> >> > 5, memory usage 3.545e-01
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:39:53,409 Preloading ['density']
>> >> > from 6
>> >> > grids
>> >> > P001 yt : [INFO     ] 2012-02-08 11:39:53,410 Starting 'Projecting
>> >> >  level  6
>> >> > /  6 '
>> >> > P000 yt : [INFO     ] 2012-02-08 11:39:54,057 Finishing 'Projecting
>> >> >  level
>> >> >  0 /  6 '
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:54,057 End of projecting level
>> >> > level
>> >> > 0, memory usage 4.482e-01
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:54,057 Preloading ['density']
>> >> > from 1
>> >> > grids
>> >> > P000 yt : [INFO     ] 2012-02-08 11:39:54,058 Starting 'Projecting
>> >> >  level  1
>> >> > /  6 'P000 yt : [INFO     ] 2012-02-08 11:39:54,070 Finishing
>> >> > 'Projecting
>> >> >  level  1 /  6 '
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:54,070 End of projecting level
>> >> > level
>> >> > 1, memory usage 4.482e-01
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:54,070 Preloading ['density']
>> >> > from 1
>> >> > gridsP000 yt : [INFO     ] 2012-02-08 11:39:54,071 Starting
>> >> > 'Projecting
>> >> >  level  2 /  6 '
>> >> > P000 yt : [INFO     ] 2012-02-08 11:39:54,130 Finishing 'Projecting
>> >> >  level
>> >> >  2 /  6 '
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:54,130 End of projecting level
>> >> > level
>> >> > 2, memory usage 4.482e-01
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:54,130 Preloading ['density']
>> >> > from 1
>> >> > grids
>> >> > P000 yt : [INFO     ] 2012-02-08 11:39:54,131 Starting 'Projecting
>> >> >  level  3
>> >> > /  6 '
>> >> > P000 yt : [INFO     ] 2012-02-08 11:39:54,783 Finishing 'Projecting
>> >> >  level
>> >> >  3 /  6 '
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:54,784 End of projecting level
>> >> > level
>> >> > 3, memory usage 4.482e-01
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:54,784 Preloading ['density']
>> >> > from 1
>> >> > grids
>> >> > P000 yt : [INFO     ] 2012-02-08 11:39:54,784 Starting 'Projecting
>> >> >  level  4
>> >> > /  6 '
>> >> > P000 yt : [INFO     ] 2012-02-08 11:39:59,389 Finishing 'Projecting
>> >> >  level
>> >> >  4 /  6 '
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:59,389 End of projecting level
>> >> > level
>> >> > 4, memory usage 5.918e-01
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:39:59,389 Preloading ['density']
>> >> > from 1
>> >> > grids
>> >> > P000 yt : [INFO     ] 2012-02-08 11:39:59,389 Starting 'Projecting
>> >> >  level  5
>> >> > /  6 '
>> >> > P000 yt : [INFO     ] 2012-02-08 11:40:17,735 Finishing 'Projecting
>> >> >  level
>> >> >  5 /  6 '
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:40:17,735 End of projecting level
>> >> > level
>> >> > 5, memory usage 1.569e+00
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:40:17,735 Preloading ['density']
>> >> > from 0
>> >> > grids
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:40:17,736 End of projecting level
>> >> > level
>> >> > 6, memory usage 1.569e+00
>> >> > P001 yt : [INFO     ] 2012-02-08 11:41:31,681 Finishing 'Projecting
>> >> >  level
>> >> >  6 /  6 '
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:41:31,681 End of projecting level
>> >> > level
>> >> > 6, memory usage 2.113e+00
>> >> > P000 yt : [DEBUG    ] 2012-02-08 11:41:33,807 Opening MPI Barrier on
>> >> > 0
>> >> > P000 yt : [INFO     ] 2012-02-08 11:41:34,502 Projection completed
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:41:34,502 Opening MPI Barrier on
>> >> > 1
>> >> > P001 yt : [INFO     ] 2012-02-08 11:41:34,502 Projection completed
>> >> > P001 yt : [DEBUG    ] 2012-02-08 11:41:34,579 Opening MPI Barrier on
>> >> > 1
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Feb 8, 2012 at 6:46 AM, Matthew Turk <matthewturk at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Chris,
>> >> >>
>> >> >> On Tue, Feb 7, 2012 at 9:30 PM, Christopher Moody <cemoody at ucsc.edu>
>> >> >> wrote:
>> >> >> > Hi guys,
>> >> >> >
>> >> >> > I've been working hard on the ART frontend. Lately, I'm to the
>> >> >> > point
>> >> >> > where
>> >> >> > I'm playing around with more complex datasets that are taking much
>> >> >> > longer to
>> >> >> > project - so I'd really like to to start using the parallelization
>> >> >> > engines.
>> >> >> > I've tried Sam's workshop parallelization demos, and they all
>> >> >> > work.
>> >> >> > But
>> >> >> > launching with the ART frontend
>> >> >> > (http://paste.yt-project.org/show/2152/)
>> >> >> > spawns many independent processes which evidently are not actually
>> >> >> > splitting
>> >> >> > the projection job, but still taking up lots of processors.
>> >> >>
>> >> >> My guess is that parallelism is not enabled for the ART frontend
>> >> >> simply as a matter of how the IO is conducted.  To make it really
>> >> >> work
>> >> >> in parallel, the IO needs to be split up so that when process 1
>> >> >> reads
>> >> >> a given grid patch, the rest of the processors don't also need to
>> >> >> read
>> >> >> all the data for that grid patch.
>> >> >>
>> >> >> Can you lower your loglevel (by setting loglevel = 1 in ~/.yt/config
>> >> >> or by --config yt.loglevel=1 on the command line) and report back
>> >> >> with
>> >> >> what it says during a projection job there?
>> >> >>
>> >> >> -MAtt
>> >> >>
>> >> >> >
>> >> >> > My mpi installation  works:
>> >> >> > yt : [INFO     ] 2012-02-07 18:12:28,207 Global parallel
>> >> >> > computation
>> >> >> > enabled: 0 / 8yt : [INFO     ] 2012-02-07 18:12:28,207 Global
>> >> >> > parallel
>> >> >> > computation enabled: 2 / 8
>> >> >> > yt : [INFO     ] 2012-02-07 18:12:28,208 Global parallel
>> >> >> > computation
>> >> >> > enabled: 1 / 8
>> >> >> > yt : [INFO     ] 2012-02-07 18:12:28,208 Global parallel
>> >> >> > computation
>> >> >> > enabled: 6 / 8
>> >> >> > yt : [INFO     ] 2012-02-07 18:12:28,208 Global parallel
>> >> >> > computation
>> >> >> > enabled: 3 / 8
>> >> >> > yt : [INFO     ] 2012-02-07 18:12:28,208 Global parallel
>> >> >> > computation
>> >> >> > enabled: 4 / 8
>> >> >> > yt : [INFO     ] 2012-02-07 18:12:28,208 Global parallel
>> >> >> > computation
>> >> >> > enabled: 5 / 8yt : [INFO     ] 2012-02-07 18:12:28,209 Global
>> >> >> > parallel
>> >> >> > computation enabled: 7 / 8
>> >> >> >
>> >> >> > But the script is just run 8 times, not any faster.
>> >> >> >
>> >> >> > What am I missing here?
>> >> >> >
>> >> >> > Many thanks!
>> >> >> > chris
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > yt-dev mailing list
>> >> >> > yt-dev at lists.spacepope.org
>> >> >> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >> >> >
>> >> >> _______________________________________________
>> >> >> yt-dev mailing list
>> >> >> yt-dev at lists.spacepope.org
>> >> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > yt-dev mailing list
>> >> > yt-dev at lists.spacepope.org
>> >> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >> >
>> >> _______________________________________________
>> >> yt-dev mailing list
>> >> yt-dev at lists.spacepope.org
>> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> >
>> >
>> > _______________________________________________
>> > yt-dev mailing list
>> > yt-dev at lists.spacepope.org
>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>



More information about the yt-dev mailing list