[yt-dev] rockstar issues

Fri Nov 9 16:10:13 PST 2012

Hi Chris & Matt,

I think I've found the source of the duplicate halo issue. What I've
changed is now readers only allocate an array as large as the number
of particles it will read in, not the full count of particles.

With this change using multiple readers substantially speeds things
up. With 12 total tasks, 1 reader, 256**3 particles in 49,000 grids
(Enzo), it took 8m42s from start to finish. But with 12 tasks/8
readers, it took only 2m28s. Of course, this only works on a parallel
disk system.

Another small change I've made is that grids are now assigned
cyclically to readers like [block::NUM_BLOCKS], rather than
np.array_split() which assigns them in blocks. This sped up a 12
task/8 reader run from 3m40s to 2m28s (quoted above) on the same Enzo
dataset. This avoids assigning all the root grids to a single reader
and spreads out the particles assigned to each reader much more
evenly, speeding up IO. My guess is that this will not negatively
affect halo finding on other kinds of data, and could speed them up as
well.

For the curious, this shows the changes for these two items in my
branch: http://tinyurl.com/bg8b3ee (goes to Bitbucket).

-- 
Stephen Skory
s at skory.us
http://stephenskory.com/
510.621.3687 (google voice)