[Yt-dev] Parallelism

Fri Aug 20 14:01:23 PDT 2010

Hi Stephen,

> >As you know (since we discussed it off-list), I'm the reason for this
> being
> >mentioned to you.  I had some pretty horrible problems with the various
> >incarnations of HOP in yt being excruciatingly slow and consuming huge
> amounts
> >of memory for a 1024^3 unigrid dataset, to the point where my grad student
> and I
> >
> >ended up just using P-GroupFinder, the standalone halo finder that comes
> with
> >week-of-code enzo.  Note that when I say "excruciatingly slow" and
> "consuming
> >huge amounts of memory", I mean that when we used 256 nodes on Ranger,
> with 2
> >cores/node (so 512 cores total) for the 1024^3 dataset, it still ran
> Ranger out
> >of memory, or, alternately, didn't finish in 24 hours.
>
> A few notes in response:
>
> - Recently I ran a 2048^3 dataset on 264 cores that took about 2 hours
> which
> averaged about 8.5GB per task with a peak task of 10 GB. Your job is 1/8
> the
> size and should have run, and I don't know why it didn't.
>

On Ranger, Kraken, or another machine?   Regardless, that's far, far less
time than it took us to NOT find halos on our dataset.  I'd be happy to
point you towards this dataset, if you'd like (I may have already done this
in an off-list email), so you can try it yourself.  I'd be VERY curious to
see if you encounter similar problems to us on Ranger and/or Kraken for our
1024^3 dataset.

> - If I wasn't trying to graduate I would have had more time to assist when
> your
> student (Brian) asked me for help. I'm sorry so much of your time was
> wasted.
>

It's more human time than computer time, at this point - we spent a big
chunk of the summer simply trying to find the halos in a box, which was
meant to be step 1 of the project.  Very frustrating for a new grad student.

> - My tool as a public tool is not any good unless other people can use it
> too.
> Clearly I need to do some work on that.
>
> - It *does* use much more memory than it needs to, you are right. I know
> where
> the problems are, and whoo-boy they are there, but they are not easy to
> fix.
>
> - Speed could be better, but some of this has to do with how HOP itself
> works.
> For example, it needs to run the kD tree twice, unlike FOF which needs to
> only
> once. The final group building step is a "global" operation, so that's slow
> as
> well. On 128^3 particles, (normal) HOP takes about 75 seconds, and FOF
> about 25.
> The C HOP and FOF in yt both use the same kD tree, same data I/O methods,
> so
> that's a fair ratio of the increased workload.
>

This is interesting, and puzzling. We have a 256^3 version of the simulation
that I was talking about earlier, and saw numbers that would be comparable
to those you mention above.  Scaled up to a much larger calculation,
however, it took way longer than one might think based on a
back-of-the-envelope estimate.  Again, I really do think that, once you
finish your thesis, it'd potentially be very useful for you to take a look
at our dataset. It may simply be that our very small box is pathological in
some way compared to the simulations you've been testing on.

--Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20100820/9aea16ea/attachment.htm>