[Yt-dev] yt documentation, standards, implementation

Sun May 30 15:18:58 PDT 2010

Hi all,

This last Friday I had a chance to talk to Tom Abel and Oliver Hahn
(both CC'd on this message) about their experiences with using yt, and
they brought up some points which I've now had a chance to think
about, and which I find very interesting, certainly as something to
discuss.  Here are my notes on it, along with a proposal for moving
forward.

As a quick note, what really hit home that we need better
documentation was trying to make a thin projection.  The definition of
what a 'source' could be wasn't there, there were no examples, and I
had to go look at the source to figure out what the parameters were
even called.  I think that's not ... good.

Python Inline Documentation
===========================

One of the coolest things about Python is the help() function, which
prints out the function signature and the contents of the doc string.
In the source code, the docstring is inline in the function, like so:

def some_function(a, b, c):
    """
    This function does something.
    """
    return a+b+c

The output of help(some_function) would look like this:

>>> help(some_function)
Help on function some_function in module __main__:

some_function(a, b, c)
    This function does something.
>>>

Generated Documentation
=======================

The yt docs are generated using an extension to Sphinx called autodoc.
 What this does, as you can see by going to the API docs and clicking
"view source" (which, counterintuitively, displays the doc source and
not the source code of the functions) is at documentaion build time,
pull all the docstrings from the source and render them in the
document.  Ideally, we would want something that renders nicely as
well as looks good in the inline help -- and to maximize the detail
without becoming encumbering.

For most of the functions in yt that have docstrings, they have been
written in a narrative style, with parameters inside asterisks, so
that they would render nicely in the API docs:

http://yt.enzotools.org/doc/modules/amrcode.html#yt-lagos-outputtypes-output-types

But, it's becoming clear that perhaps this is not the best approach.
I think a combination of narrative and explicit parameter declaration
would be better.  The NumPy/SciPy projects have a CodingStandards
description:

http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines

that covers docstrings, with a very detailed example of a completely
filled out docstring here:

http://svn.scipy.org/svn/numpy/trunk/doc/example.py

As an example, the 'tensorsolve' function is defined here:

http://svn.scipy.org/svn/numpy/trunk/numpy/linalg/linalg.py

and the API docs are here:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.tensorsolve.html

This looks great, I think.  yt is a bit more class-oriented than
NumPy, but I believe that we should strive for a similar level of
detail as well as a similar style: presenting parameters, what those
parameters can be, and a brief word on the return type.

Ideal Type Of Documentation
===========================

A few weeks ago, Tom and I were chatting and he mentioned to me a
Pascal manual.  In this manual, there was a single function on every
page: a description, parameters (often repeated between functions, but
explicitly listed for each), and an example.  My first Unix manual was
exactly like this, and I remember it being one of the best sets of
documentation I've ever used.  I believe this is the model NumPy and
SciPy are striving for, as well.

I think this is what yt should strive for, too.  One page per class or
function, with a description, parameters, and examples -- just like
mentioned above.  In doing so, I think that the online help -- which
right now is sort of helpful, but not amazingly helpful, would become
much more useful.

The fact that on the mailing lists we get questions asking us about
fundamental operations in yt is, I think, an indictment of the way
it's presented.  As the Enzo Workshop revs up, a couple of us will be
writing talks about using Enzo, using yt, etc, and I think this is a
time to harness that momentum to reorganize and rewrite some of the
doc strings.  Of course, I would take the lead on the initial rewrite,
as I'm the one who wrote all the bad docstrings.

What does everyone think about this?

Action Items
============

(It wouldn't be a long email about procedures if we didn't use a
buzzword like 'action items' :)

Firstly: a vote and a request for comments.

Do we want to agree on the NumPy standard for docstrings?  What does
everyone think about this idea, of a set of docstring guidelines, and
trying to focus on a better set of API documentation, to be used both
in generated form and inline via help()?

If we can agree on the NumPy standard, I believe that I should be able
to convert most of the docstrings with some relative ease; it's mostly
going to be a matter of typing, copy/pasting, etc.  I will copy a
style guide into doc/, which will be largely taken from the NumPy
style guide, but I will additionally add a document with examples for
common strings: I would prefer we have a single, consistent manner for
referring to things like AMR3DData as a source, for instance.  I will
then go through and convert all the doc strings that I am familiar
with.  This would leave us with three files:

    * Example docstring, which can be read in verbatim and edited.
    * List of yt idioms for cross-referencing and describing things.
    * File describing this standard, largely pulling from the NumPy standard.

The next thing will be, going forward, how do we ensure that the doc
strings are correctly inserted with new code?  I am more guilty of
this than I would care to admit (I sometimes fall into the camp of
thinking that functions with well-named parameters are
self-documenting, which is probably a mistake!) but I think having
someone agree to review incoming changesets for documentation updates,
and then to email the committer if they do not have a sufficient
docstring.  My inclination is to suggest that someone who already
reviews incoming changesets to do this, which I think means either me,
Sam or Stephen.  Sam, would you be willing to take this on?  It should
be relatively straightforward.

Additionally, would anyone volunteer to help me out with rewriting
some of the existing docstrings?  In particular, for code you have
contributed?

The End
=======

I think that if we really take the docstrings seriously, then the
documentation on the whole will vastly improve.  I am in the process
of rewriting some sections, removing the old-style tutorial and trying
to better walk the user through the process of getting up and running.
 The current documentation has a lot of information, but it's not very
good at getting people up and running in anything other than the most
simple manner.  I think that getting started on improving the
docstrings will also help refocus efforts toward better documentation
on the whole.  And, I'd like to end by admitting culpability for the
sorry state of the docstrings we currently have.  But I think this
might be good, in the long run, because it'll help out with getting us
on track for a better code that's much easier to use!

And finally, thanks to Tom and Oliver for taking the time to chat with
me about this -- I really appreciate their thoughtful feedback on
this.

Best,

Matt