[Yt-dev] Parallel Hop

Wed Feb 25 08:19:53 PST 2009

> When it comes to this, I think there should be the choice to write either in 'parallel' to multiple HDF5 files, which is very fast but less convenient, or in 'serial' to one HDF5 file which is very slow, but handy. As an example, writing out HOP data from a 512^3 particle dataset (L7) takes less than 10 minutes in parallel mode, but in serial mode takes over five hours. At least it takes that long on Ranger, and some of that may be due to pytables and very large datasets.

I'd like for halos to have only one method, receiving the file handle.
 The finder can also have a single method.  When run in parallel this
will distribute file handles based on the processor; when run in
serial, just one will be distributed, to each in turn.  The finder
method will be a parallel transaction, blocking across processors at
the end; the halo method will not.

> Do we think it's a good idea to include a text file with a parallel write, like for packed AMR enzo, that lists the location of each halo dataset in the .cpu files? The situations I can think of when one would only want a subset of the haloes are when one wants a specific halo or haloes from a physical region of the box. Since the box is spatially decomposed, it may also be useful to record the boundaries of each .cpu file in a simple way.

Writing out the cpu file is not a bad idea.  Two column ASCII, written
out by the root processor, listing halo id and filename.  The
boundaries file would probably be seven column, cpu, left edge and
right edge.  (Without padding)

I'll implement these methods today.

-Matt