[Yt-dev] Embedded yt and IPython

Sat Jun 6 09:57:49 PDT 2009

Hi everyone,

I'm writing to share with you something I'm kind of excited about, and
to ask for some input on the next couple steps.

yt now runs inside of Enzo, via an embedded Python interpreter.  (This
has been accomplished in both the LCA devel trunk and Stanford Enzo,
but I'm now only focusing on LCA devel trunk from here on out.)  I
finished this a couple months ago, but I've recently brought it back
up to speed and started using it -- it can do any of the operations
that both run in parallel and do not require spatial decomposition.
This means that essentially everything other than clump finding,
projections and halo finding.  I'm working on a fixed resolution
projection that will not require spatial decomposition.  (I do not
want *any* distribution of data beyond what Enzo does.)

What this leaves is profiles, derived quantities, slices, cutting
planes and some other stuff.

So you can run yt scripts inside enzo, and I'm testing this out via a
PopIII run that I started.  It could be faster -- the hierarchy gets
reinstantiated every time yt gets called -- but that's part of a
broader effort I'm going to start with optimizing yt.  It works.  I've
been outputting slices and some profiles every 10 subcycles of the
highest level, and it's working fine and not leaking memory (as far as
I can tell.)

Anyway, this is cool, I'm proud of it, but this morning I finished the
second major aspect that I wanted to get going -- direct interaction
with a running Enzo instance via IPython.  You can open up an IPython
session and connect to remotely running Enzo processes.  They
communicate via SSL sockets.  I've tested this at SLAC across compute
nodes and hosts as well as on my laptop.

However, the proxying methods can be slightly tedious, albeit
functional -- to that end, I'd like to solicit input for which
functions for interaction would be the most useful.  To re-state the
architecture:

Compute Nodes: these run remotely with Enzo.  Some signaling process
causes them to spawn Python interpreters that act as MultiEngines in
IPython.

Controller Node: This is an IPython instance that runs under the
user's command.  Commands can be distributed to the nodes.  This uses
the MultiEngineClient.  (more info:
http://ipython.scipy.org/doc/manual/html/parallel/parallel_multiengine.html
)

So to interact, you use things like mec.execute("print
sphere.quantities["AngularMomentumVector"]()") and whatnot.  Commands
can go to all the nodes or a subset of the nodes.  Results can be
displayed or pulled back across the network, so arrays can be passed
around.

However, getting at the raw grid data -- rather than derived values,
etc -- can be tedious.  I'm going to implement commands for the
following:

get_grid_data(grid_id, field)  - this is cgs, including derived
fields, standard yt fields, etc

get_grid_raw_data(grid_id, field, ghost_zones=False)  - this would be
the *raw* array values, possibly including ghost zones.

push_grid_raw_data(grid_id, field, new_values)  - this would let you
push *back* grid values into the arrays.  This is dangerous, but an
important aspect of what I'd like to do.

So I guess that's all I could come up with.  I'll be writing proxy
classes for slice and cutting plane plots so that they can pull up
GUIs and whatnot on the main node.  Right now slices can't be pulled
across the network because of the pickling process, which means they
also can't be plotted interactively locally.  I'm working on that.

Anyway, thoughts?  I'm pretty excited about using this for debugging
and development.  What other commands would be useful for this?  Dave,
I feel like you've got a good handle on some interesting aspects of
debugging -- what would you like to be able to do?  I feel pretty
confident we can do a LOT of fun stuff with this, particularly by
avoiding the disk.

-Matt