[Yt-dev] Simulation Database

Tue Sep 6 11:49:05 PDT 2011

Matt,

> Thanks for the pointer -- this is super cool.  How would you see this
> working, in practice?

I don't think that we should abandon a local database. There are
enough firewalled machines out there, which would block SimpleDB
acces, meaning a local copy is still useful. I think SimpleDB should
be complimentary to local database(s). I'm not sure that the SimpleDB
copy should be pulled from in the same way that the local copy is when
yt runs.

In practice, I think we should simply push whatever would go into the
local database to SimpleDB, along with a machine identifier (*). This
should be transparent to the user once they've entered their AWS
credentials (**).

I envision a webpage (say, mydb.yt-project.org) where users can enter
their AWS creds to do simple queries or just see all their datasets.
There are lots of web-based tools out there already, but it may not be
difficult to write our own if the available tools are not suitable.

It would be cool to add a command like "yt crawl DD????/DD????" that
would automatically load all the dataset info into the databases
without having to load() all of them manually. Another command like
"yt dbclean" would delete datasets from the database (or mark them as
stale or broken) that are no longer on disk and do the same online.

I don't think we need to over think this. Just mirror the local
database to SimpleDB, and possibly write a simple python webapp that
uses Boto and whatever else to look at the datasets in a manageable
way hosted on yt-project.org. Any fancier features will come if people
need them, but I think this alone is pretty killer.

I'm willing to contribute time to this stuff.

(*) I think that this may prove to be more difficult than one might
think to automate, so perhaps we shouldn't even try. The first place
one would look for the name is 'hostname', but this could be as useful
as 'login03.machine.institution.edu' or even as useless 'node0284' for
an internal 10.x.x.x address. I think that the best way would be to
query the user for their preferred name for this machine when yt is
installed, and stored similarly to the hdf5.cfg file in a plain text
file.

(**) This raises the question of whether we want yt to store AWS
credentials or not. The simplest solution is to have  them stored in
the same file as the machine name with 'chmod 0600'. More secure
solutions involving hashes get more complicated past that...

-- 
Stephen Skory
s at skory.us
http://stephenskory.com/
510.621.3687 (google voice)