[yt-users] timing parallel runs

Tue Mar 8 06:40:17 PST 2011

Thanks Matt, 

Now that I think of it timing slices man not be the ideal thing to test. I will have a look at your pointers. How is the in-situ work on enzo going. Is there a section in the YT docs that deals with using YT in this fashion? 

Dave
----- Original Message -----
> Hi Dave,
> 
> On Mon, Mar 7, 2011 at 11:55 AM, Dave Semeraro
> <semeraro at ncsa.illinois.edu> wrote:
> >
> > Hi there,
> >
> > I am trying to get a feel for how YT scales in parallel. I am using
> > the time() function of python to wrap individual parts of a yt
> > script. For example, I do this:
> >
> > start = time.time()
> > pc.add_slice("Density", 0)
> > end = time.time()
> > slicetime = end - start
> > print "slice took %f seconds" %slicetime
> >
> > Each rank does this and I get a variety of times across ranks. I am
> > not seeing any difference in the max time with number of processes
> > however. For example, if the max across ranks for the slice time is
> > .8 seconds with 8 processors it is still .8 seconds for 16
> > processors. So I must be doing something wrong. Anybody done this
> > before?
> 
> I'm also actually working on some more detailed scaling studies, but
> I've been stymied lately by some issues with a few supercomputer
> centers. My repository for these studies is here:
> 
> https://bitbucket.org/MatthewTurk/yt_profiling/
> 
> (My plan is to include the scaling results into the answer testing
> suite.)
> 
> I am wondering if perhaps there's just not enough work to distribute
> across the processors, and if the dominant cost is the generation of
> the objects here. Can you tell us a bit about the simulation, and how
> many processors are running it? I believe this operation should be
> conducted in parallel.
> 
> Initially, my suspicion was that the slice was load-on-demand, but
> having just examined this I think it should in fact be touching the
> disk and communicating between processors. There's just not a lot of
> work to be done with slicing, I suppose.
> 
> An alternate method that would be more effective would be to try 2D
> profiling. (1D profiling is, interestingly enough, *slower* than 2D
> profiling.) This ensures that every grid will be touched by the
> computation.
> 
> For timing, you could also look into the timing_counters Stephen Skory
> has put in. There's quite a bit of usage of them in the Parallel HOP
> code, under analysis_modules/halo_finding, which shows how to set up
> nested counters, etc etc.
> 
> -Matt
> 
> >
> > Dave
> > _______________________________________________
> > yt-users mailing list
> > yt-users at lists.spacepope.org
> > http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
> >
> _______________________________________________
> yt-users mailing list
> yt-users at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org