[yt-dev] Testing Infrastructure: Datasets for ART, Orion, FLASH, etc ...

Fri Oct 12 15:08:15 PDT 2012

Thanks for the clarification Nathan. I see how that can build up very
quickly.

- Casey

On Fri, Oct 12, 2012 at 3:06 PM, Britton Smith <brittonsmith at gmail.com>wrote:

> What is the number of tests of unique functionality?
>
>
> On Fri, Oct 12, 2012 at 6:04 PM, Nathan Goldbaum <nathan12343 at gmail.com>wrote:
>
>> In this terminology, each assert statement is a test.  It's quite easy to
>> make dozens of new tests inside a couple of nested for loops.
>>
>> On Oct 12, 2012, at 3:02 PM, Casey W. Stark wrote:
>>
>> Hey Matt.
>>
>> I would like to provide the data for Nyx. Not sure what sort of output
>> would be useful though.
>>
>> So I knew of some of the tests you and Anthony added, but there are 500
>> unit tests now? Isn't that a bit strange?
>>
>> - Casey
>>
>>
>> On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk at gmail.com>wrote:
>>
>>> Hi all,
>>>
>>> Today at UCSC, Nathan, Chris (Moody) and I sat down and went through
>>> what we wanted to accomplish with testing.  This comes back to the
>>> age-old dichotomy between unit testing and answer testing.  But what
>>> this really comes back to, now that we had the opportunity to think
>>> about it, is the difference between testing components and
>>> functionality versus testing frontends.
>>>
>>> So the idea here is:
>>>
>>> Unit tests => Cover, using either manually inserted data values or
>>> randomly generated "parameter files", individual units of the code.
>>> Stephen and I have written a bunch in the last couple days.  We have
>>> nearly 500, and they take < 1 minute to run.
>>>
>>> Frontend/Answer tests => Cover a large portion of high-level
>>> functionality that touches a lot of the code, but do so by running
>>> things like projections, profiles, etc on actual data from actual
>>> simulation codes, which then get compared to reference values that are
>>> stored somewhere.  Currently we have ~550 answer tests, and they run
>>> every 30 minutes on moving7_0010 (comes wit yt) and once a day on
>>> JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .)  We do not
>>> have automated FLASH testing.
>>>
>>> The next step is:
>>>
>>> 1) Getting a bunch of non-proprietary sets of data that are small
>>> *and* medium, for each code base we want to test.  This data must be
>>> non-proprietary!  For small, I would say they can be trivially small.
>>> For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk.  I
>>> would think that GasSloshing and WindTunnel could work for FLASH.  But
>>> we still need ART data (from Chris Moody), GDF or Piernik data (from
>>> Kacper), Orion data (if possible), Nyx data (if possible).  I will
>>> handle adding RAMSES data in the 3.0 branch.
>>> 2) Getting a mechanism to run answer tests that isn't "Matt's
>>> desktop."  I've emailed Shining Panda about this, but if they don't
>>> have the ability to provide us with a FLOSS license, I think we can
>>> identify some funding to do this.
>>> 3) Have a mechanism to display and collate results.  ShiningPanda
>>> would do this if we were on their systems.
>>> 4) Make it much easier to flag individual tests as needing updates.  I
>>> think the Data Hub will be the end place for this, but this is lower
>>> priority.
>>> 5) Migrate answer testing to use unit testing framework, as most of
>>> what we've done there re-implements stuff that is in the unit testing
>>> frameworks.  This will mean we can much more easily handle
>>> test-discovery, which is a huge plus.
>>>
>>> Ultimately, the end product of all of this is that we should
>>> eventually have a method for running a single set of tests that do
>>> test discovery that loads up a bunch of different data outputs, runs
>>> answer tests on all of them, runs the unit tests, etc etc.  I think it
>>> just needs the last 25% to finish up the infrastructure.
>>>
>>> So: those of you out there who have access to any datasets of types
>>> otehr than FLASH or Enzo, can you provide non-proprietary, medium-size
>>> and small-size datasets?  I'd like to have two for every code base, at
>>> least.
>>>
>>> So: those of you who want to help out, would you be interested in
>>> looking at the answer_testing framework with me?  I am happy to
>>> discuss it over email or IRC to convert it to the numpy testing
>>> format, which will be much easier to maintain in the long run and make
>>> it much easier to have a single testing system that works for
>>> everything.
>>>
>>> -Matt
>>> _______________________________________________
>>> yt-dev mailing list
>>> yt-dev at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>>
>>
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>>
>>
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20121012/e3e82dca/attachment.htm>