[yt-dev] Should we stop "yield"ing tests?

Thu Oct 8 09:42:01 PDT 2015

On 10/08/2015 11:11 AM, Nathan Goldbaum wrote:
> Hi all,
> 
> Sorry for the novel of an e-mail! I wanted to be as detailed as possible, I
> hope it isn't too much.
> 
> Right now our test suite does a lot of test yielding. Under this paradigm,
> every yielded test counts as a "dot" in the nose testing output and
> contributes to the number of tests printed at the end of the test suite. In
> addition, yielding tests makes it so that if a test function has a failing
> test, all of the tests yielded by that function will be run despite the
> fact that one or more tests failed.
> 
> It's definitely nice to see that we're running thousands of tests, and I'd
> happy with continuing to do it, except I've learned that this approach adds
> some technical hurdles for people who actually need to deal with the test
> suite.
> 
> Personally, I've noticed that it makes debugging failing tests much more
> annoying than it needs to be. Since a test is yielded, the traceback
> generated by a failing or erroring test ends up somewhere in nose rather
> than in the yt test suite function that actually yielded the test. This can
> make it difficult to determine where a failing test is coming from in the
> test suite, particularly if the test that gets yielded is just an assert.
> 
> In addition, Kacper tells me that our practice of yielding tests makes it
> difficult to simplify our jenkins setup since yielded tests are not
> parallel-safe.

Just a small clarification: test generators and parellism in nose is
currently completely broken[1]. AFAIK this issue was never fixed.
Thread safety is a slightly different thing. Some test we have, which
aren't generators, would also fail with parallel nose.

[1]
https://groups.google.com/forum/#!msg/nose-users/PHcGXlGQZMg/XKgUsDcyf7cJ

> 
> I'd like to propose a modification to the yt codebase and developer guide.
> Rather than encouraging that all asserts and test classes be yielded, we
> should instead *not* yield them and just call them as regular functions or
> classes. To make that concrete, the following set of tests from the NMSU
> ART answer tests would go from looking like this:
> 
> http://paste.yt-project.org/show/5944/
> 
> to looking like this:
> 
> http://paste.yt-project.org/show/5946/
> 
> Each individual assert and test class instantiation would no longer count
> as an individual test in nose's test statistics.  Instead, referring to the
> example above, the test_d9p function would be the only test reported by
> nose, even though the test function does many asserts and instantiates many
> test class instances.
> 
> For me, the main win would be that it would be easier to determine which
> exact test failed, because the traceback reported by nose due to the test
> failure would include the line in the test file that produced a failing
> assert or caused an unhandled exception to be raised.
> 
> To make that concrete, I've just made the following modification to the
> `test_dimensionless` function to force a test to fail:
> 
> http://paste.yt-project.org/show/5947/
> 
> Running `nosetests units` in the root of the repository, I get the
> following traceback:
> 
> http://paste.yt-project.org/show/5948/
> 
> Note how the test traceback does *not* include the `test_dimensionless`
> function.  If I instead make it so the failing test is not yielded:
> 
> http://paste.yt-project.org/show/5949/
> 
> I get a much nicer traceback:
> 
> http://paste.yt-project.org/show/5950/
> 
> I can see a few reasons why we might not want to do this.
> 
> 1. This is an invasive, large-scale change. It might be automatable (I
> think I could write an emacs macro to do this, for example), but it would
> in the end be difficult to review.
> 
> 2. Since test functions would terminate on the first failure, it might lead
> to annoying debug cycles where one assert gets fixed but then the next
> assert fails, forcing people to wait for the full test suite to be rerun
> just to get to the next failing test.
> 
> For the second point, I think we can remedy this just by improving our
> testing docs to encourage people to run tests locally as much as possible
> and also explain better how to run only a specific test function from the
> command line.
> 
> If people are interested in making this modification globally, I think I
> could take it on as part of my general efforts to clean up the codebase.
> 
> -Nathan
> 
> 
> 
> _______________________________________________
> yt-dev mailing list
> yt-dev at lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20151008/3e220c3d/attachment.sig>