[yt-dev] Should we stop "yield"ing tests?

Thu Oct 8 09:11:16 PDT 2015

Hi all,

Sorry for the novel of an e-mail! I wanted to be as detailed as possible, I
hope it isn't too much.

Right now our test suite does a lot of test yielding. Under this paradigm,
every yielded test counts as a "dot" in the nose testing output and
contributes to the number of tests printed at the end of the test suite. In
addition, yielding tests makes it so that if a test function has a failing
test, all of the tests yielded by that function will be run despite the
fact that one or more tests failed.

It's definitely nice to see that we're running thousands of tests, and I'd
happy with continuing to do it, except I've learned that this approach adds
some technical hurdles for people who actually need to deal with the test
suite.

Personally, I've noticed that it makes debugging failing tests much more
annoying than it needs to be. Since a test is yielded, the traceback
generated by a failing or erroring test ends up somewhere in nose rather
than in the yt test suite function that actually yielded the test. This can
make it difficult to determine where a failing test is coming from in the
test suite, particularly if the test that gets yielded is just an assert.

In addition, Kacper tells me that our practice of yielding tests makes it
difficult to simplify our jenkins setup since yielded tests are not
parallel-safe.

I'd like to propose a modification to the yt codebase and developer guide.
Rather than encouraging that all asserts and test classes be yielded, we
should instead *not* yield them and just call them as regular functions or
classes. To make that concrete, the following set of tests from the NMSU
ART answer tests would go from looking like this:

http://paste.yt-project.org/show/5944/

to looking like this:

http://paste.yt-project.org/show/5946/

Each individual assert and test class instantiation would no longer count
as an individual test in nose's test statistics.  Instead, referring to the
example above, the test_d9p function would be the only test reported by
nose, even though the test function does many asserts and instantiates many
test class instances.

For me, the main win would be that it would be easier to determine which
exact test failed, because the traceback reported by nose due to the test
failure would include the line in the test file that produced a failing
assert or caused an unhandled exception to be raised.

To make that concrete, I've just made the following modification to the
`test_dimensionless` function to force a test to fail:

http://paste.yt-project.org/show/5947/

Running `nosetests units` in the root of the repository, I get the
following traceback:

http://paste.yt-project.org/show/5948/

Note how the test traceback does *not* include the `test_dimensionless`
function.  If I instead make it so the failing test is not yielded:

http://paste.yt-project.org/show/5949/

I get a much nicer traceback:

http://paste.yt-project.org/show/5950/

I can see a few reasons why we might not want to do this.

1. This is an invasive, large-scale change. It might be automatable (I
think I could write an emacs macro to do this, for example), but it would
in the end be difficult to review.

2. Since test functions would terminate on the first failure, it might lead
to annoying debug cycles where one assert gets fixed but then the next
assert fails, forcing people to wait for the full test suite to be rerun
just to get to the next failing test.

For the second point, I think we can remedy this just by improving our
testing docs to encourage people to run tests locally as much as possible
and also explain better how to run only a specific test function from the
command line.

If people are interested in making this modification globally, I think I
could take it on as part of my general efforts to clean up the codebase.

-Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/attachments/20151008/470a7221/attachment.htm>