[Yt-dev] Quad Tree projections

Wed Apr 28 19:44:32 PDT 2010

Hi guys,

On Monday I rewrote the (adaptive) projection backend to use Quad
Trees for point placement instead of the current system.  As it
stands, the current system for projecting works pretty well, except
for a couple caveats:

- It has high peak memory usage
- It can only be decomposed in the image plane, so inline projections
can't be made, and load balancing AMR was poor
- It's complicated, but that complication comes from wanting things to be fast

Essentially, it's fast by wall-clock standards, but you need to run in
parallel for big datasets.  Plus, if you want to run with big
datasets, you probably need to fully occupy nodes, because of the peak
memory usage.

Anyway, I was getting very frustrated with points #1 and #3 (but not
#2, oddly) and so I wrote a new backend using Quad Trees.  We did
everything using int math anyway, but because joining grids was
O(NxM), things got a bit slow in some domains.  The conversion wasn't
too bad.

I ran some simple benchmarks.  These were all conducted on the Triton
resource, in serial, on a single node.  They do not reflect time for
disk IO, but they should roughly reflect real memory load.

 * For my 31 level PopIII binary star run (875 million cells, 8000
grids), it took 9 seconds with peak memory usage of 176 megs.
 * For a large, 768^3 with (peak) 12 levels of refinement (874 million
cells in 125,000 grids), it takes 81 seconds with peak memory load of
840 megs.
 * For the 512^3 L7 lightcone run, at a time when the data had 300,000
grids, it took 170 seconds with peak memory usage of 2.5 gigs.

In all cases, the results are correct according to the standard check
of projecting "Ones".  These are crazy improvements, especially
considering that these are times for serial projections.

And important note here is that this *will* parallelize independent of
the image plane, although I have yet to see the need to parallelize
just yet.  The load balancing may be better optimized based on image
plane decomposition, but the algorithm should work with independent
joins.  Additionally in parallel the final join at the end of the
calculation may have higher peak memory usage if the decomposition is
not via image plane decomposition, but it should still work.

Unfortunately, right now my time is very constrained; so I am going to
put out there that if someone would be willing to help me break apart
the existing code and insert this code, then we can swap out the
engines.  But in the absence of that, I'm not sure that it'll get
inserted into the main line before the Summer, or at least the late
Spring.  As is, I can definitely provide a very simple and manual
interface to use it (and I intend to do so) but integrating it into
the mainline is not going to be my priority right now.  But, again, I
will provide *an* interface, even if it's not a *final* interface and
likely not a *nice* interface.

Anyway, I'm pretty excited about this -- definitely a huge improvement
over the previous algorithm, and we still retain the *adaptive* nature
of the projections: project once, plot many.

-Matt