[yt-dev] halo finding overhaul

Mon Nov 12 14:18:53 PST 2012

Hi Stephen,

Sorry that this slipped through the cracks.

On Fri, Nov 9, 2012 at 7:27 PM, Stephen Skory <s at skory.us> wrote:
> Hi Matt,
>
>> So something like this actually probably won't work at all:
>>
>> halos = HaloFinder(pf)
>> for halo in parallel_objects(halos):
>>     mass = halo.total_mass()
>>
>> ...because what's going on is that Proc1 might be waiting for a
>> message from Proc2 for the mass of Halo403, but it won't show up,
>> because Proc2 is busy waiting for Proc3 to send Halo343.
>
> What I should have said in my first message in this thread is that I
> have been doing exactly this with halo objects for some time, but with
> halos I've loaded off disk (with LoadHaloes, for example). These
> halos, I think, are much closer to what Matt is envisioning, in that
> when you LoadHaloes() in parallel, all the tasks read in and are
> knowledgeable about all the halos. The halos are lightweight, in that
> they don't have particles attached to them, unless you ask for them.

Yes, this is what I meant, and what I like.  What I would like to
transition to is thinking about LoadHaloes as the actual primary halo
objects, and have them only have particles if explicitly asked for.
In fact, I'd like to get rid of the distinction between loaded and
non-loaded halos.

>
>> I think we want to discourage particles access.  In fact, I think we
>> should discard particles from memory after the halos have been
>> identified.
>
> I disagree, but I think we are disagreeing about slightly different
> things. If our new halo objects were more like lightweight
> LoadedHaloes, I think particle access can be maintained by storing
> them on disk, which is how a LoadedHalo works now: particles are
> loaded on demand. Here's what I think is a good way to achieve your
> goals, Matt, without sacrificing functionality:

Yes, that's precisely what I meant: as soon as the halo finder
terminates and the particle catalogs are (optionally) written to disk,
clear the memory.

>
> - Run a halo finder and as part of the run call (or included
> callbacks) one may specify that particles should be saved to disk.
> I've written machinery that only needs particle IDs when reading data
> in, so all the other fields can be excluded from being written out,
> saving disk space.

+1

> - As part of the halo finder run, finish by making lightweight,
> LoadedHalo-type halos, that don't have particles attached to them. The
> list is complete (information-wise and membership-wise) and identical
> across tasks.

I'd prefer we simply join the two classes and get rid of the
distinction, but I like this.  And I think halos (since they are
potentially heavyweight) should throw keyerrors if particle fields are
queried without a load_particles call.  This also helps with parallel
decomp.

> - If you want particles for some new analysis not covered in the
> callbacks, any task can pull them off disk independent of other tasks,
> functionally identical to what we have right now.

Yup.

>
> What do you think?
>
>> halos = HaloFinder(pf, ..., callbacks = [center_of_mass, ...])
>
>> Does this make sense?
>
> I think that this is feasible, if I'm understanding things correctly.
> Just to be clear, these are functions that would operate on the
> particles before they are thrown away from memory, and possibly also
> written to disk? If so, I think that this is a fine way of going
> ahead.

Yup, that's right.

Thanks for writing back!

-Matt

>
> Have a nice weekend!
>
> --
> Stephen Skory
> s at skory.us
> http://stephenskory.com/
> 510.621.3687 (google voice)
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org