[yt-users] YT help with clumps

Thu Sep 2 19:57:52 PDT 2010

Hi YT users, over the past week I've spoken to a couple YT developers and
got some great feedback and suggestions, but I just want to outline my
plans/strategies/troubles to see if I can get more help from the
community,

I'm currently trying to use YT to analyze data on ionized clumps of gas
from the FLD Radiation runs of Enzo I made.

My plan is to (if possible):
1) Create a hierarchy of clumps, based on their level of ionization
instead of topology like in the cookbook find_clumps()
2) Calculate a global clumping factor from regions inside these clumps
3) Find the location(x,y,z) of the peaks of HI_Density and HI/H (ionized
fraction) and the value of Density in those regions
4) Find the volume inside each clump region
5) From the hierarchy of clumps, create their merger history
6) Save the clump information in such a way that I can come back to it if
I find something else that's interesting to analyze about them
7) Volume render the clumps separately and/or together on the same picture

my strategy and the troubles I am running into for the corresponding
points are:

1) I want to avoid re-inventing the wheel and use the current machinery
inside YT to create the hierarchy of clumps.  I think I can use the
Clump() to find a master clump like in the cookbook, but instead of
calling on find_clump(), call find_children().  Inside find_children, I
can supply it a different minimum value for the amount of ionization I
want, because they may not be equally spaced or equal multiples of each
other.  So maybe for level 0 I can do find_children( min[0], max), then
level 1 do find_children(min[1], max) etc.  The problems I'm running into
are

- I haven't played around with this enough to see if it'll work for just 1
level down from the master clump
- If I don't use find_clumps then I'll need another way to make this
process recursive, it might be as simple as copying what's done in
find_clumps() without the checking if it is valid, but I am not sure.

2) I think an logical way of doing this is to calculate the local clumping
factor for each clumps individually and get the global one from the
ionizing clumps by doing a volume weighted average.

- Should it be weighted by mass or something else?

3) I can get the index of the peak region by something like
for child in master_clump.children:
  HI_peak_index = na.argmax(child["HI_Density"])
  HI_peak_x = child["x"][HI_peak_index]
finding the Density value would be
  Density_at_Peak_HI = child["Density"][HI_peak_index]

- Don't think I'll run into trouble here.

4) I know the regular Clump's write_info() writes the number of cells from
set_default_clump_info(), but can it return the physical volume maybe in
mpc or co-moving mpc?

- It would be easy for unigrid simulations that I do now, but if there's a
more general way of getting this information for the AMR case I'd like to
know.  Right now I can just take the cell count and multiply it by the
volume per cell... harder for the AMR.

5) I know Stephen was able to do create a Halo merger tree, I think it was
done with the SQLite database or he displayed the tree from a SQlite
database.  I was wondering if something similar can be done for ionized
clumps.  I think Matt mentioned it was doable with just using the YT
object without the need of the database.  There seems to be more attention
at sqlite lately, saw an email from Irina couple days ago, but not sure if
that issue was resolved.  I myself didn't encounter that problem at all I
just had to "import sqlite3" and everything worked, on my mac OSX 10.6.4,
Kraken, and Triton.

my question:
- Is it more straight forward one way or the other? (w/ or w/o database)
- I'm having a hard time coming up with an unique identifier that keeps
track of a clump.  Do I define it having the peak at a certain cell?  Dave
suggested this, but I've considered this previously and thought I'd get
into trouble if the peak moved from one data output to another, maybe he
has a workaround.  Or do I identify a clump by containing a certain star
particle?  Because the star particles are all uniquely defined with a
number, so this way there is no ambiguity.  But this is assuming that the
star particle will not move beyond the specific clump, which may or may
not be an valid assumption for all levels of ionization thoughout the
entire simulation.  Or is there a much simpler build in way that python
identify each object that I don't know about?

6) There's the write_info() for the Clump(), but I don't think that is
adequate for what I need.  Dave and Matt suggested cPickle() where I save
the location of the object, which I can later access if I have the data at
its original location.  An alternative is to save the data I want in a
database as mentioned before.

pros of cPickle:  little data is duplicated, everything is in python
cons of cPickle:  original simulation data has to be available, when
scratch disk fills up, big simulation data are usually the first to go.

pros of database:  can do a lot of type of retrieval that's already
pre-programmed, can access the clump data that's saved even if the big
simulation is on archive only.
cons of database:  Do not have any information about data that's not
previously saved, and duplicate some data redundantly.

7) I see that for slices of data, I can do a callback of .clump() to plot
the contour, but I was wondering if it's just as simple to plot clumps in
volume rendering.  Maybe sometimes have contour of different ionization on
the same picture, sometimes only the specific ionization I want.

I apologize for the email being kind of long and wordy, but any
help/suggestions on any of the points is appreicated, thanks :-)

From
G.S.