[yt-dev] Testing Infrastructure: Datasets for ART, Orion, FLASH, etc ...

Fri Oct 12 14:54:20 PDT 2012

Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through
what we wanted to accomplish with testing.  This comes back to the
age-old dichotomy between unit testing and answer testing.  But what
this really comes back to, now that we had the opportunity to think
about it, is the difference between testing components and
functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or
randomly generated "parameter files", individual units of the code.
Stephen and I have written a bunch in the last couple days.  We have
nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level
functionality that touches a lot of the code, but do so by running
things like projections, profiles, etc on actual data from actual
simulation codes, which then get compared to reference values that are
stored somewhere.  Currently we have ~550 answer tests, and they run
every 30 minutes on moving7_0010 (comes wit yt) and once a day on
JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .)  We do not
have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small
*and* medium, for each code base we want to test.  This data must be
non-proprietary!  For small, I would say they can be trivially small.
For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk.  I
would think that GasSloshing and WindTunnel could work for FLASH.  But
we still need ART data (from Chris Moody), GDF or Piernik data (from
Kacper), Orion data (if possible), Nyx data (if possible).  I will
handle adding RAMSES data in the 3.0 branch.
2) Getting a mechanism to run answer tests that isn't "Matt's
desktop."  I've emailed Shining Panda about this, but if they don't
have the ability to provide us with a FLOSS license, I think we can
identify some funding to do this.
3) Have a mechanism to display and collate results.  ShiningPanda
would do this if we were on their systems.
4) Make it much easier to flag individual tests as needing updates.  I
think the Data Hub will be the end place for this, but this is lower
priority.
5) Migrate answer testing to use unit testing framework, as most of
what we've done there re-implements stuff that is in the unit testing
frameworks.  This will mean we can much more easily handle
test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should
eventually have a method for running a single set of tests that do
test discovery that loads up a bunch of different data outputs, runs
answer tests on all of them, runs the unit tests, etc etc.  I think it
just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types
otehr than FLASH or Enzo, can you provide non-proprietary, medium-size
and small-size datasets?  I'd like to have two for every code base, at
least.

So: those of you who want to help out, would you be interested in
looking at the answer_testing framework with me?  I am happy to
discuss it over email or IRC to convert it to the numpy testing
format, which will be much easier to maintain in the long run and make
it much easier to have a single testing system that works for
everything.

-Matt