[Yt-dev] Halo Merger Tree

Mon Feb 1 09:22:49 PST 2010

Hi Stephen,

This all looks great.  I fully support keeping things in the SQLite
format, as long as there's also a supplemental plain text format that
can be output as well -- i.e., having a write_out method on the
MergerTree object, as well as write_to_graphviz.

Very nice work!

-Matt

On Sun, Jan 31, 2010 at 3:11 PM, Stephen Skory <stephenskory at yahoo.com> wrote:
> Hi All,
>
> partly as an expanded answer to Eric's questions from last week, here are some more details on how the merger tree works, with special attention to how the database is laid out. If creating a merger tree is something you can picture yourself doing, please let me know if what I've written will do what you need, and if not, I'd like suggestions on how to improve things for your interests. As it stands the MergerTree class isn't very complicated and feature-full, so it would be best to make major conceptual changes now before things get bigger and more difficult.
>
> The first step is to find the halos from all the datasets the user wants. Unfortunately, it is likely that unless halos were found with the merger tree in mind (and I'm thinking about the light cone here), halo finding will have to be done again. This is because not only does the class depend on the HopAnalysis.out file, it needs the HDF5 files that contain the particle data for the halos (from .write_particle_lists()), and the particle_index field, in particular.
>
> After all the halos have been found, a SQLite database file is opened and if it doesn't already contain the 'Halos' table, the table is created. This is how the table is created, which shows the columns in the database.
>
> CREATE TABLE Halos (GlobalHaloID INTEGER PRIMARY KEY,\
>                    SnapCurrentTimeIdentifier INTEGER, SnapZ FLOAT, SnapHaloID INTEGER, \
>                    DarkMatterMass FLOAT,\
>                    NumPart INTEGER, CenMassX FLOAT, CenMassY FLOAT,\
>                    CenMassZ FLOAT, BulkVelX FLOAT, BulkVelY FLOAT, BulkVelZ FLOAT,\
>                    MaxRad FLOAT,\
>                    ChildHaloID0 INTEGER, ChildHaloFrac0 FLOAT, \
>                    ChildHaloID1 INTEGER, ChildHaloFrac1 FLOAT, \
>                    ChildHaloID2 INTEGER, ChildHaloFrac2 FLOAT, \
>                    ChildHaloID3 INTEGER, ChildHaloFrac3 FLOAT, \
>                    ChildHaloID4 INTEGER, ChildHaloFrac4 FLOAT);"
>
> The 'GlobalHaloID' is an automatically increasing unique index value that is assigned by the database when a new entry is inserted. All the other columns must be specified with either factual or placeholder values when a new halo is inserted into the database. SnapHaloID is the index of the halo at that particular snapshot, which is not unique over time. The ChildHaloID values point to child halos of this halo (future halos this halo contributes particles to), and the ChildHaloFrac is what fraction of this halo goes to the child. This should be clearer once I explain how I make the tree.
>
> Once the database is set up, the HopAnalysis.out files (I name them MergerTree.out in the code) are parsed by the HaloProfiler methods, and their particulars are added into the database, with placeholder values for the ChildHalo columns.
>
> Next, stepping forward in time, pairs of snapshots are compared. First, I create a kd-tree of the halos from the second (future) snapshot. Then I loop over the halos in the first (past) snapshot, identifying the 'likely' children of the parent halos based on geometric locality to child halos. I then attach this list of likely children to the parent halo.
>
> With the list of likely children, the fractional contribution of a parents particles to each likely child is compared by reading in the particle_index fields from their respective HDF5 files. The amounts are sorted and entered into the database, such that ChildHaloID0/Frac0 is the child halo from that parent that receives the most of the parents particles. In every merger tree method I've ever seen, there is only ever one child because typically ChildHaloFrac0 will be well over 90%, so this is kind of an experiment on my part. But halos sometimes take a while to merge together, and if a halo is slowly being consumed by a larger one, this kind of multiple-children record keeping may show it.
>
> Once every single pair of consecutive snapshots are examined, the merger tree is complete. The only halos without factual ChildHalo records should be the z=0 ones (or whenever the user chooses to stop it).
>
> The reason why I am using a SQL database is once the tree is calculated, it is trivial to pull information out of it. So if you want to find only the main linage of halo0 from the last snapshot, you would write SQL queries like this:
>
> halo0globalID = 123456
>
> lineage = {}
> def findParent(haloID):
>    line = "SELECT GlobalHaloID from Halos where ChildHaloID0=?;"
>    values = (haloID,)
>    cursor.execute(line, values)
>    parentID = cursor.fetchone()[0]
>    lineage[parentID] = haloID
>    # Now we recurse back in time.
>    findParent(parentID)
>
> then lineage would be a dict of parent haloIDs pointing to the children IDs. This may not be the best way to store the lineage, but it shows how simply one can do this kind of operation. 'cursor' is the SQLite file pointer, and 'execute()' is the way you enter or retrieve data from or into the database.
>
> Say you want to look for halos nearby a particular halo.
>
> halo_cm = [0.26, 0.31, 0.54]
> halo_z = 0.0
> line = "SELECT GlobalHaloID, CenMassX, CenMassY, CenMassZ from Halos where CenMassX>? and CenMassX<? and CenMassY>? and CennMassY<? and CenMassZ>? and CenMassZ<? and SnapZ=?;"
> values = (halo_cm[0]-0.05, halo_cm[0]+0.05, halo_cm[1]-0.05, halo_cm[1]+0.05, halo_cm[2]-0.05, halo_cm[2]+0.05, halo_z)
> cursor.execute(line, values)
> neighbors = []
> result = cursor.fetchone()
> while result:
>   neighbors.append([result[0], result[1], result[2], result[3]])
>   result = cursor.fetchone()
>
> and each element of neighbors would have the GlobalHaloID and the 3-position of that halo in a four element list. In this way you can do very powerful searches over all halos.
>
> I've written a fairly simple class to output the halo merger tree for a given subset of halos in Graphviz format. As I wrote before (I think) Graphviz makes it really easy to make a simple text file that then can be visually represented in a sensical way. An improvement on the output would have to be in concert with modifying or adding SQL queries, so they go hand-in-hand. But using the merger tree shouldn't be only to make visualizations, and that's why the merger tree and graphviz output are different classes. The SQL database should be created, and then you can analyze the data in many more ways very cheaply.
>
> Does this make sense to people? Is there a feature missing? I'm sure there is! And thanks for reading this far! _______________________________________________________
> sskory at physics.ucsd.edu           o__  Stephen Skory
> http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
> ________________________________(_)_\(_)_______________
>
> _______________________________________________
> Yt-dev mailing list
> Yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>