[yt-dev] Arguments to scripts

Fri Dec 2 08:30:05 PST 2011

Hi all,

I have prepared a Pull Request to change how yt processes arguments to
scripts.  I just issued it, but I am emailing because I think
discussion of what it does warrants a bit more public hashing out.
The PR is not done yet, for the reasons I outline below, so please
don't anybody accept it yet.

https://bitbucket.org/yt_analysis/yt/pull-request/38/overhaul-configuration-system

This will directly affect you if you have:

1) Ever written "from yt.config import ytcfg; ytcfg[...."
2) Ever put your *own* command-line parser into yt.
3) Gotten annoyed with configuration files.

What I've done is create a new file, startup_tasks.py, that gets
imported whenever yt.mods gets imported, and only the first time that
happens.  It sets up an argument parser (using argparse, which is
Python 2.7 only) that parses looking for:

--parallel
--paste
--paste-detailed
--detailed
--rpdb
--parallel

One of the things this does is that it also provides --help, so you
can see what is available.  Furthermore, I've added a --config option,
so that from the command line you can set configuration options.  For
instance:

--config serialize=False

and so on.  This is pretty cool I think and will go a long way toward
making things nicer.  However, the way this works is still up for a
few more problems.  There are basically two ways this can work:

 * Parse the entirety of sys.args and accept all arguments that yt
finds, rejecting and throwing an error on unrecognized ones (i.e.,
typos or things you might pass in to a script your write on the
command line).  This will be an exclusive operation.
 * Parse *non-exclusively*, allowing unrecognized arguments to pass
through.  However, the old arguments will still be there: so any
script that has issues with things like --parallel and whatnot will
now see there, whereas it did not before because yt (totally un-cool!)
stripped them out of the sys.args variable.  I don't want to do this
anymore.

The way I have implemented this for the yt command line tool is to set
a flag that says, "We're also inside the command line, so don't parse
anything, we'll handle adding new options to the parser and then we'll
parse everything at the end."  This way you can pass both --parallel
and whatever option the yt command line utility wants.  This works
because startup_tasks creates a "parser" object, adds arguments to
that parser object, then delays actually conducting the parsing until
all the arguments from teh command line tool have been added.

There are four ways this can work.  I have presented them in order of
my increasing preference.  (Coincidentally, on the astropy mailing
list they discussed this this week, as I was thinking about my
feelings on it as well, and they are moving away from parsing args in
the library; I think that works for them because AstroPy is designed
to be used much more inside larger frameworks, whereas yt is somewhat
more insular.)
1) Don't do any argument parsing if not called through a yt-specific
script runner.  This means if you want to pass --parallel, you have to
run with something like "yt run my_script.py --parallel".  Same for
--config and so on.
2) Parse all arguments any time yt.mods is imported, do not allow for
additional arguments.  This breaks scripts that have their own
parsing.
3) Parse *some* of the arguments, but not all.  All typos would
succeed and this could lead to confusion for the user.
4) Provide a yt-specific mechanism for adding new arguments.  So if
you want to add new arguments, you do it at the top of your script,
rather than the bottom, and at the bottom inside the construction "if
__name__ == '__main__'" you'd inspect the values.

Anyway, I'm inclined to go for #4, simply because it would be the
simplest mechanism for ensuring an explicit method of getting
arguments into user-written scripts.

Thoughts?

-Matt