[yt-dev] Testing Infrastructure: Datasets for ART, Orion, FLASH, etc ...

Matthew Turk matthewturk at gmail.com
Fri Oct 12 15:09:48 PDT 2012


Hi Britton,

On Fri, Oct 12, 2012 at 3:06 PM, Britton Smith <brittonsmith at gmail.com> wrote:
> What is the number of tests of unique functionality?

I'm not sure I know what you mean?  For the most part, we're testing
many aspects of individual components.  As an example, we now have a
whole bunch of tests that address different aspects (each of which is
relatively tricky and sensitive to changes in the code base) of
covering grids, projections, profiles, and so on.  So I guess in a
sense, we're really well-testing about 5 different pieces of the code
in the unit tests.  In the answer tests we test a much broader section
of the code, but it takes longer and requires reference data.

-Matt

>
>
> On Fri, Oct 12, 2012 at 6:04 PM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
>>
>> In this terminology, each assert statement is a test.  It's quite easy to
>> make dozens of new tests inside a couple of nested for loops.
>>
>> On Oct 12, 2012, at 3:02 PM, Casey W. Stark wrote:
>>
>> Hey Matt.
>>
>> I would like to provide the data for Nyx. Not sure what sort of output
>> would be useful though.
>>
>> So I knew of some of the tests you and Anthony added, but there are 500
>> unit tests now? Isn't that a bit strange?
>>
>> - Casey
>>
>>
>> On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk at gmail.com>
>> wrote:
>>>
>>> Hi all,
>>>
>>> Today at UCSC, Nathan, Chris (Moody) and I sat down and went through
>>> what we wanted to accomplish with testing.  This comes back to the
>>> age-old dichotomy between unit testing and answer testing.  But what
>>> this really comes back to, now that we had the opportunity to think
>>> about it, is the difference between testing components and
>>> functionality versus testing frontends.
>>>
>>> So the idea here is:
>>>
>>> Unit tests => Cover, using either manually inserted data values or
>>> randomly generated "parameter files", individual units of the code.
>>> Stephen and I have written a bunch in the last couple days.  We have
>>> nearly 500, and they take < 1 minute to run.
>>>
>>> Frontend/Answer tests => Cover a large portion of high-level
>>> functionality that touches a lot of the code, but do so by running
>>> things like projections, profiles, etc on actual data from actual
>>> simulation codes, which then get compared to reference values that are
>>> stored somewhere.  Currently we have ~550 answer tests, and they run
>>> every 30 minutes on moving7_0010 (comes wit yt) and once a day on
>>> JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .)  We do not
>>> have automated FLASH testing.
>>>
>>> The next step is:
>>>
>>> 1) Getting a bunch of non-proprietary sets of data that are small
>>> *and* medium, for each code base we want to test.  This data must be
>>> non-proprietary!  For small, I would say they can be trivially small.
>>> For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk.  I
>>> would think that GasSloshing and WindTunnel could work for FLASH.  But
>>> we still need ART data (from Chris Moody), GDF or Piernik data (from
>>> Kacper), Orion data (if possible), Nyx data (if possible).  I will
>>> handle adding RAMSES data in the 3.0 branch.
>>> 2) Getting a mechanism to run answer tests that isn't "Matt's
>>> desktop."  I've emailed Shining Panda about this, but if they don't
>>> have the ability to provide us with a FLOSS license, I think we can
>>> identify some funding to do this.
>>> 3) Have a mechanism to display and collate results.  ShiningPanda
>>> would do this if we were on their systems.
>>> 4) Make it much easier to flag individual tests as needing updates.  I
>>> think the Data Hub will be the end place for this, but this is lower
>>> priority.
>>> 5) Migrate answer testing to use unit testing framework, as most of
>>> what we've done there re-implements stuff that is in the unit testing
>>> frameworks.  This will mean we can much more easily handle
>>> test-discovery, which is a huge plus.
>>>
>>> Ultimately, the end product of all of this is that we should
>>> eventually have a method for running a single set of tests that do
>>> test discovery that loads up a bunch of different data outputs, runs
>>> answer tests on all of them, runs the unit tests, etc etc.  I think it
>>> just needs the last 25% to finish up the infrastructure.
>>>
>>> So: those of you out there who have access to any datasets of types
>>> otehr than FLASH or Enzo, can you provide non-proprietary, medium-size
>>> and small-size datasets?  I'd like to have two for every code base, at
>>> least.
>>>
>>> So: those of you who want to help out, would you be interested in
>>> looking at the answer_testing framework with me?  I am happy to
>>> discuss it over email or IRC to convert it to the numpy testing
>>> format, which will be much easier to maintain in the long run and make
>>> it much easier to have a single testing system that works for
>>> everything.
>>>
>>> -Matt
>>> _______________________________________________
>>> yt-dev mailing list
>>> yt-dev at lists.spacepope.org
>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>>
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>>
>>
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>



More information about the yt-dev mailing list