[Yt-dev] Quad Tree projections

Thu May 13 19:07:28 PDT 2010

Hi all,

Another update!  Today I was sent a 1024^3, 7 level, 1.6e9 cell
dataset.  The stats were:

0        512   1073741824
  1      288126   433602800
  2      83777     92854272
  3      29655     23459488
  4      14328      7726208
  5       6733      4637416
  6       3512      3355336
  7       1647      1997864
----------------------------
         428290  1641375208

That's all total cells, not effective cells.  I ran the quadtree
projection code on it, including data IO, but no weighting fields, in
serial, on a low-memory node.  The results made me very, very happy.

To project the entire dataset to finest resolution, it took 2340
seconds.  The peak memory usage was 2.7gigs.  I'm *really* happy with
how low those two numbers are.  This would fit in a single core on
Kraken.  I'm substantially more motivated to wrap this into the
existing projection machinery now.  On the converse, with these
numbers the way they are, I'm much less motivated to parallelize it.
:)

With the image pan-n-scanner, it's actually completely interactive to
pan through it, zoom in however you like, all of that -- I was doing
it here on my machine with frame rates that were about 1-5, but I
think on a faster machine it might work much better.

Anyway, in case anybody wants to try it at home, I've uploaded a zip
file with the projection, the necessary info and a couple scripts that
let you either pan interactively if you have Chaco installed (if you
used the Snow Leopard install script, you should!) or, if you don't
like the whole GUI thing, a pan controller that saves out an image
every time you make a move.

To run the pan_controller, do something like:

ipython -q4thread pan_controller.py

To run the not-as-interactive script, do this:

python2.6 -i pan_saver.py

Now you have access do an object, ip, that saves out a new image
called "wimage_000.png" every time you call .zoom(factor) or any of
the other methods on the object that change the bounding box of its
image.  This includes zoom, pan(delx, dely), pan_rel(reldelx, reldely)
and some others, but those are the big ones.  If you open
wimage_000.png in preview, every time it saves preview should update.

The zip file is available here, but I'll likely delete it in a few days:

http://yt.enzotools.org/files/rd0027_panner.zip

It's about 92 megs compressed, but uncompresses to around 900mb.  Let
me know if it breaks, but I think otherwise this is a really cool way
to share images and data -- this 92 meg bundle lets you zoom all the
way up in a humongous AMR dataset!  I'll be writing up how to share
data like this for regular projections for 1.7's docs.

-Matt

On Wed, Apr 28, 2010 at 7:44 PM, Matthew Turk <matthewturk at gmail.com> wrote:
> Hi guys,
>
> On Monday I rewrote the (adaptive) projection backend to use Quad
> Trees for point placement instead of the current system.  As it
> stands, the current system for projecting works pretty well, except
> for a couple caveats:
>
> - It has high peak memory usage
> - It can only be decomposed in the image plane, so inline projections
> can't be made, and load balancing AMR was poor
> - It's complicated, but that complication comes from wanting things to be fast
>
> Essentially, it's fast by wall-clock standards, but you need to run in
> parallel for big datasets.  Plus, if you want to run with big
> datasets, you probably need to fully occupy nodes, because of the peak
> memory usage.
>
> Anyway, I was getting very frustrated with points #1 and #3 (but not
> #2, oddly) and so I wrote a new backend using Quad Trees.  We did
> everything using int math anyway, but because joining grids was
> O(NxM), things got a bit slow in some domains.  The conversion wasn't
> too bad.
>
> I ran some simple benchmarks.  These were all conducted on the Triton
> resource, in serial, on a single node.  They do not reflect time for
> disk IO, but they should roughly reflect real memory load.
>
>  * For my 31 level PopIII binary star run (875 million cells, 8000
> grids), it took 9 seconds with peak memory usage of 176 megs.
>  * For a large, 768^3 with (peak) 12 levels of refinement (874 million
> cells in 125,000 grids), it takes 81 seconds with peak memory load of
> 840 megs.
>  * For the 512^3 L7 lightcone run, at a time when the data had 300,000
> grids, it took 170 seconds with peak memory usage of 2.5 gigs.
>
> In all cases, the results are correct according to the standard check
> of projecting "Ones".  These are crazy improvements, especially
> considering that these are times for serial projections.
>
> And important note here is that this *will* parallelize independent of
> the image plane, although I have yet to see the need to parallelize
> just yet.  The load balancing may be better optimized based on image
> plane decomposition, but the algorithm should work with independent
> joins.  Additionally in parallel the final join at the end of the
> calculation may have higher peak memory usage if the decomposition is
> not via image plane decomposition, but it should still work.
>
> Unfortunately, right now my time is very constrained; so I am going to
> put out there that if someone would be willing to help me break apart
> the existing code and insert this code, then we can swap out the
> engines.  But in the absence of that, I'm not sure that it'll get
> inserted into the main line before the Summer, or at least the late
> Spring.  As is, I can definitely provide a very simple and manual
> interface to use it (and I intend to do so) but integrating it into
> the mainline is not going to be my priority right now.  But, again, I
> will provide *an* interface, even if it's not a *final* interface and
> likely not a *nice* interface.
>
> Anyway, I'm pretty excited about this -- definitely a huge improvement
> over the previous algorithm, and we still retain the *adaptive* nature
> of the projections: project once, plot many.
>
> -Matt
>