[yt-dev] Testing Intervention

Mon Sep 24 19:01:57 PDT 2012

Hey Casey,

On Mon, Sep 24, 2012 at 5:39 PM, Casey W. Stark <caseywstark at gmail.com> wrote:
> Hi Matt.
>
> Glad my example was useful in some way. I guess knowing exactly which Cython
> routines to test for what is what I meant about where to start.

I completely understand -- and that's a failure of my part in the
code.  So I think what might help is if we start looking at the
routines that could be tested, and then I can jump in when it looks
like they're buried.  I will also volunteer to try to refactor this
code, once it's tested, to make it clearer what is where and why.

>
> Thanks for the tip about the stream frontend.

No prob, it should be documented better.

In terms of generating data, here's an example of what I was meaning:

https://hub.yt-project.org/nb/ompxtg

This sets up a random (but small-valued) background density and then
applies a couple top hat spheres on top of it.  I think this could be
a good starting point for unit testing.  I looked at the code that
exists for flagging cells and refining them, and it's currently a bit
too specific for this, as it was designed to take RAMSES data and
partition it, which operates slightly differently.  I'll take a pass
at extracting it and making it work in this case.

The way I see it working is that one would set up the operators, as is
done here, and then a progressive routine would be applied, something
like this:

while grids.flag() > 0:
    grids.refine()
    for operator in operators: operator.apply(grids)
pf = grids.convert_to_dataset()

The missing step in the code base is mostly just the refinement, as
for the smoothed covering grids we have the necessary machinery to
interpolate from one level to the next.  Adding more types of
operators to this library would be beneficial as well.  The
finalization step at the end would then convert the collection of
grids into a Stream dataset.  So with only a couple lines of code we
could in all likelihood be able to generate in-memory datasets.

(This underscores in my mind why it would be awesome to have GDF in
more places, as with this kind of machinery we've just written initial
conditions generation, especially since we have a GDF writer
already...)

Anyway, what might be really helpful is if interested people would
volunteer to address a few of the testable areas?  Then we can start
pushing and identifying problem areas.  I'll volunteer to handle at
least the data container stuff, and whatever else slips through the
cracks; although my time will be somewhat limited in the very near
future, I will try to make this a priority.

-Matt

>
> - Casey
>
>
> On Mon, Sep 24, 2012 at 1:52 PM, Matthew Turk <matthewturk at gmail.com> wrote:
>>
>> Hey Casey and Anthony,
>>
>> On Mon, Sep 24, 2012 at 4:20 PM, Casey W. Stark <caseywstark at gmail.com>
>> wrote:
>> > Hi Anthony.
>> >
>> > I completely agree that we should target the level of functions actually
>> > performing the projection rather than yt's organization. The mock
>> > frontend
>> > suggestion was just a hack to get there. I don't know if there's a way
>> > around it though...
>> >
>> > Here's an example of what I sorted through to get to projections:
>> > - Load a test plotfile, check pf.h.proj to find it's source.
>> > - Read through data_objects/hierarchy.py and
>> > utilities/parallel_tools/parallel_analysis_interface.py to find where
>> > proj
>> > is attached, can't find it.
>> > - The proj docstring says it is a reference to AMRQuadProj. Can't find a
>> > class by that name.
>> > - Search data_objects sources for "proj", find AMRProjBase.
>> >
>> > So it looks like the functionality is wrapped up in the __project_level
>> > and
>> > _project_grid methods. I can't think of a way to test those without
>> > creating
>> > an AMRProjBase, and that requires a staticoutput object.
>>
>> You're right, the projection stuff as *projections* is not easy to
>> test.  But in terms of testing the underlying code, which is wrapped
>> up in a Cython class called QuadTree, I think it could be done.  The
>> steps you're describing are actually all part of the existing answer
>> testing machinery, which performs a couple things and verifies that
>> they don't change over time:
>>
>> 1) Project some fields from the disk
>> 2) Project a couple derived fields
>> 3) Project a derived field that requires spatial derivatives
>> 4) Project the "Ones" field, which should be 1.0 everywhere.
>>
>> So these things are done, but it is also possible that the specific
>> quadtree functionality could be tested, in isolation from the
>> projection.  I think this may be oneo f the things Anthony is talking
>> about -- answer testing can handle the big, complex items, and by
>> breaking down to the fundamentals we can address isolated items from a
>> unit testing perspective.
>>
>> >
>> > So unfortunately, I think it would still come down to having a fake
>> > frontend. It's not ideal, but it seems like any more isolation would
>> > require
>> > big rewrites to yt.
>>
>> One fun thing that is not usually known is that we have a fake
>> frontend already, it just doesn't get used much.  It's called the
>> "Stream" frontend and it was designed originally to be used in
>> ParaView, but now gets used by the (new, not-yet-documented/released)
>> load_uniform_grid function as well as by Hyperion, the RT code by Tom
>> R.  It can set up AMR as well as static mesh.  It's not terribly well
>> documented, but there are examples on the wiki.
>>
>> One thing I've been thinking about is actually creating a couple fake
>> outputs, which could be defined analytically with spheres of
>> overdensity inside them.  In principle, if we added refinement
>> criteria, we could make this relatively complex data that was defined
>> with only a few lines of code, but spun up a big in-memory dataset.
>>
>> (This exact thing is on my list of things to do and then to output in
>> GDF, by the way...)
>>
>> That I think could come, down the road a bit.  The refinement criteria
>> wouldn't be too bad to implement, especially since we already have the
>> grid splitting routines.  I just don't think we should focus on it at
>> the moment.  But the uniform grid creation and loading works already
>> -- I used it this morning.  You can do it with:
>>
>> from yt.frontends.stream.api import load_uniform_grid
>> ug = load_uniform_grid({"VelocityNorm":data1, "Density":data2}, [359,
>> 359, 359], 1.0)
>>
>> the list is the dimensions of the data and the value is the to-cm
>> conversion.
>>
>> >
>> > Of course, I could be missing something. Matt, can you think of a better
>> > way?
>>
>> I think for this specific example (and your damningly complex tracing
>> of things through the source ...) the easiest thing to do is isolate
>> the Cython routine, which it seems I was able to do only because I
>> wrote it and which seems quite buried in the code, and to also provide
>> high-level machinery for faking a frontend.
>>
>> -Matt
>>
>> >
>> > - Casey
>> >
>> >
>> > On Mon, Sep 24, 2012 at 11:02 AM, Anthony Scopatz <scopatz at gmail.com>
>> > wrote:
>> >>
>> >> Helo Casey,
>> >>
>> >> Sorry for taking the whole weekend to respond.
>> >>
>> >>>> I would like to help with this, but it's difficult to figure out
>> >>>> where
>> >>>> to start.
>> >>
>> >>
>> >> Not to worry. I think that any of the items listed at the bottom of
>> >> Matt's
>> >> original email
>> >> would be a great place to start.
>> >>
>> >>>>
>> >>>>
>> >>>> Say I want to test projections. I make a fake 3D density field, maybe
>> >>>> something as simple as np.arange(4**3).reshape((4, 4, 4)). I write
>> >>>> down the
>> >>>> answer to the x-projection. Now all I need to do is call
>> >>>> assert_allclose(yt_result, answer, rtol=1e-15), but I don't know what
>> >>>> pieces
>> >>>> of low-level yt stuff to call to get to `yt_result`. Hopefully that's
>> >>>> clear...
>> >>>>
>> >>>> Maybe this comes down to creating a fake frontend we can attach
>> >>>> fields
>> >>>> to?
>> >>
>> >>
>> >> Actually, I disagree with this strategy, as I told Matt when we spoke
>> >> last
>> >> week.
>> >> What is important is that we test the science and math parts of the
>> >> code
>> >> before, if ever, dealing with the software architecture that surrounds
>> >> them.
>> >>
>> >> Let's taking your example of projections.  What we need to test is the
>> >> actual function
>> >> or method which actually slogs through the projection calculation.  In
>> >> many cases in
>> >> yt these functions are not directly attached to the front end but live
>> >> in
>> >> analysis, visualization
>> >> or utilities subpackages.   It is these such packages that we should
>> >> worry
>> >> about testing.
>> >> We can easily create routines to feed them sample data.
>> >>
>> >> On the other hand, testing or mocking things like frontends should be a
>> >> very low priority.
>> >> At the end of the day what you are testing here is pulling in data from
>> >> disk or other
>> >> sources.  Effectively, this is just re-testing functionality present in
>> >> h5py, etc.  That is not
>> >> really our job.  Yes, in a perfect world, front ends would be tested
>> >> too.
>> >> But I think that the
>> >> priority should be placed on things like the KDTree.
>> >>
>> >> Be Well
>> >> Anthony
>> >>
>> >>>>
>> >>>>
>> >>>> - Casey
>> >>>>
>> >>>>
>> >>>> On Fri, Sep 21, 2012 at 2:42 PM, Matthew Turk <matthewturk at gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> As some of you have seen (at least Stephen), I filed a ticket this
>> >>>>> morning about increasing testing coverage.  The other night Anthony
>> >>>>> and I met up in NYC and he had something of an "intervention" about
>> >>>>> the sufficiency of answer testing for yt; it didn't take too much
>> >>>>> work
>> >>>>> on his part to convince me that we should be testing not just
>> >>>>> against
>> >>>>> a gold standard, but also performing unit tests.  In the past I had
>> >>>>> eschewed unit testing simply because the task of mocking data was
>> >>>>> quite tricky, and by adding tests that use smaller bits we could
>> >>>>> cover
>> >>>>> unit testable areas with answer testing.
>> >>>>>
>> >>>>> But, this isn't really a good strategy.  Let's move to having both.
>> >>>>> The testing infrastructure he recommends is the nearly-omnipresent
>> >>>>> nose:
>> >>>>>
>> >>>>> http://nose.readthedocs.org/en/latest/
>> >>>>>
>> >>>>> The ticket to track this is here:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> https://bitbucket.org/yt_analysis/yt/issue/426/increase-unit-test-coverage
>> >>>>>
>> >>>>> There are a couple sub-items here:
>> >>>>>
>> >>>>> 1) NumPy's nose test plugins provide a lot of necessary
>> >>>>> functionality
>> >>>>> that we have reimplemented in the answer testing utilities.  I'd
>> >>>>> like
>> >>>>> to start using the numpy plugins, which include things like
>> >>>>> conditional test execution, array comparisons, "slow" tests, etc
>> >>>>> etc.
>> >>>>> 2) We can evaluate, using conditional test execution, moving to nose
>> >>>>> for answer testing.  But that's not on the agenda now.
>> >>>>> 3) Writing tests for nose is super easy, and running them is too.
>> >>>>> Just
>> >>>>> do:
>> >>>>>
>> >>>>> nosetest -w yt/
>> >>>>>
>> >>>>> when in your source directory.
>> >>>>>
>> >>>>> 4) I've written a simple sample here:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> https://bitbucket.org/yt_analysis/yt-3.0/src/da10ffc17f6d/yt/utilities/tests/test_interpolators.py
>> >>>>>
>> >>>>> 5) I'll handle writing up some mock data that doesn't require
>> >>>>> shipping
>> >>>>> lots of binary files, which can then be used for checking things
>> >>>>> that
>> >>>>> absolutely require hierarchies.
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> The way to organize tests is easy.  Inside each directory with
>> >>>>> testable items create a new directory called "tests", and in here
>> >>>>> toss
>> >>>>> some scripts.  You can stick a bunch of functions in those scripts.
>> >>>>>
>> >>>>> Anyway, I'm going to start writing more of these (in the main yt
>> >>>>> repo,
>> >>>>> and this change will be grafted there as well) and I'll write back
>> >>>>> once the data mocking is ready.  I'd like it if we started
>> >>>>> encouraging
>> >>>>> or even mandating simple tests (and/or answer tests) for
>> >>>>> functionality
>> >>>>> that gets added, but that's a discussion that should be held
>> >>>>> separately.
>> >>>>>
>> >>>>> The items on the ticket:
>> >>>>>
>> >>>>>  * kD-tree for nearest neighbor
>> >>>>>  * Geometric selection routines
>> >>>>>  * Profiles
>> >>>>>  * Projections -- underlying quadtree
>> >>>>>  * Data object selection of data containers
>> >>>>>  * Data object selection of points
>> >>>>>  * Orientation class
>> >>>>>  * Pixelization
>> >>>>>  * Color maps
>> >>>>>  * PNG writing
>> >>>>>
>> >>>>> Is anyone willing to claim any additional items that they will help
>> >>>>> write unit tests for?
>> >>>>>
>> >>>>> -Matt
>> >>>>> _______________________________________________
>> >>>>> yt-dev mailing list
>> >>>>> yt-dev at lists.spacepope.org
>> >>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> yt-dev mailing list
>> >>>> yt-dev at lists.spacepope.org
>> >>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>>>
>> >>
>> >>
>> >> _______________________________________________
>> >> yt-dev mailing list
>> >> yt-dev at lists.spacepope.org
>> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >>
>> >
>> >
>> > _______________________________________________
>> > yt-dev mailing list
>> > yt-dev at lists.spacepope.org
>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev at lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>