[yt-dev] halo finding overhaul

Stephen Skory s at skory.us
Fri Nov 9 16:27:06 PST 2012


Hi Matt,

> So something like this actually probably won't work at all:
>
> halos = HaloFinder(pf)
> for halo in parallel_objects(halos):
>     mass = halo.total_mass()
>
> ...because what's going on is that Proc1 might be waiting for a
> message from Proc2 for the mass of Halo403, but it won't show up,
> because Proc2 is busy waiting for Proc3 to send Halo343.

What I should have said in my first message in this thread is that I
have been doing exactly this with halo objects for some time, but with
halos I've loaded off disk (with LoadHaloes, for example). These
halos, I think, are much closer to what Matt is envisioning, in that
when you LoadHaloes() in parallel, all the tasks read in and are
knowledgeable about all the halos. The halos are lightweight, in that
they don't have particles attached to them, unless you ask for them.

> I think we want to discourage particles access.  In fact, I think we
> should discard particles from memory after the halos have been
> identified.

I disagree, but I think we are disagreeing about slightly different
things. If our new halo objects were more like lightweight
LoadedHaloes, I think particle access can be maintained by storing
them on disk, which is how a LoadedHalo works now: particles are
loaded on demand. Here's what I think is a good way to achieve your
goals, Matt, without sacrificing functionality:

- Run a halo finder and as part of the run call (or included
callbacks) one may specify that particles should be saved to disk.
I've written machinery that only needs particle IDs when reading data
in, so all the other fields can be excluded from being written out,
saving disk space.
- As part of the halo finder run, finish by making lightweight,
LoadedHalo-type halos, that don't have particles attached to them. The
list is complete (information-wise and membership-wise) and identical
across tasks.
- If you want particles for some new analysis not covered in the
callbacks, any task can pull them off disk independent of other tasks,
functionally identical to what we have right now.

What do you think?

> halos = HaloFinder(pf, ..., callbacks = [center_of_mass, ...])

> Does this make sense?

I think that this is feasible, if I'm understanding things correctly.
Just to be clear, these are functions that would operate on the
particles before they are thrown away from memory, and possibly also
written to disk? If so, I think that this is a fine way of going
ahead.

Have a nice weekend!

-- 
Stephen Skory
s at skory.us
http://stephenskory.com/
510.621.3687 (google voice)



More information about the yt-dev mailing list