Skip to main content

Introduction - Command Line Arguments

From our first steps with the command line, we use it to pass arguments to programs. Even the most basic actions of moving files or navigating the file system require passing arguments. And yet, it took me years to realize that I also wanted my scripts to take command line arguments in this way.

Of course I had taken input through the command line before. Every beginner python programmer encounters input(), and Java's Scanner class was happily within my comfort zone. Sure, Java made it clear that it could take command line arguments (public static void main(String[] args) after all). But I didn't write programs with command line options in mind.

This realization only hit when I was playing around with bash scripts for the efaPi project. I thought it would be fun to automate testing. Since I was thinking in terms of how I normally interact with the command line, I started to wonder how options and arguments from the command line result in different behavior from programs. And so began my foray into command line interfaces.

In this post I will give a short overview of the basic handling of command line arguments with bash, followed by examples for how similar results can be achieved with two modules in the python standard library: getopt and argparse.

bash arguments

While writing the bash scripts for efaPi, I decided to add an optional argument to run the scripts in a test mode.

First, it should be clearly stated that handling a single optional with bash does not actually require any fancy footwork. I could have simply checked to see if the first argument ($1) matched some option keyword and set a variable accordingly:


  testMode='false'
  if [[ $1 == 'optionKeyword' ]];
      then testMode='true'
  fi
                        

but what if there were multiple options that could be in any order? And what about having multiple single character options without any spaces between them (eg tar -xzvf file.tar.gz)? It seemed like a common enough problem to expect a built-in solution.

And bash does indeed deliver. It offers a built-in tool called getopts, which is a bash specific version of the system tool getopt. It is system independent, but limited to taking "short" single character option names. This was fine by me, so I gave it a spin.

bash getopts

When a bash script is run with additional arguments, everything on the command line except the script name is stored by bash in the $@ variable. Demonstrating with a dummyScript.sh that echos $@:


$ ./dummyScript.sh -a arg1 -b arg2
-a arg1 -b arg2
                        

getopts parses the positional parameters in $@, and compares the values to those defined in an "optstring" to identify options and their arguments.

If the first character in the opt string is a colon, then getopts will use silent error handling. If a colon follows an option character, then that option takes an argument. So the optstring ":vf:" will use silent handling, have an option "-v" that does not take an argument, and an option "-f" that does take an argument.

getopts is intended to be run in a while loop and will return false to break out of the loop once it exhausts the possible options. If an option from the optstring is found, then the response is selected using a case switch. So for my simple case of checking for the '-t' option to decide if the script should run in testing mode:


testFlag='false'
while getopts ":t" flag; do
    case "${flag}" in
        t) testFlag='true';;
    esac
done
                        

Now that we've seen the optstr in action, we can switch to python's rather similar getopt module.

python getopt

Python's getopt module is inspired by the getopt() method from C and is a comfortable switch from getopts in bash.

This was for a different project with different options. I wanted the following options: help (-h, --help) to display help text, make (-m, --make) to create tables from schema defined in a file, add (-a, --addFromFile) to add entries to a table from file, suggestions (-s, --suggestions) to get related titles for a given title and language.

My first approach with getopt is certainly reminiscent of the bash script:


import getopt, sys

def main(argv):
    try:
        opts, args = getopt.getopt(argv, "hm:a:s:", ["help", "make=",
                                    "addFromFile=", "suggestions="])
    except getopt.GetoptError:
        print("Error: Invalid argument")
        sys.exit(1)
    for opt, arg in opts:
        if opt in ("-h", "--help"):
            helpText()
        elif opt in ("-m", "--make"):
            makeTables(arg)
        elif opt in ("-a", "--addFromFile"):
            addMultiple(arg)
        elif opt in ("-s", "--suggestions"):
            flagPos = argv.index("-s")
            title = argv[flagPos+1]   # could use arg
            language = agv[flagPos+2] # could use args[0]
            getSuggestions(title, language)

if __name__ = "__main__":
    main(sys.argv[1:])
                        

The functions called by the if/elif statements are not shown for simplicity.

The getopt.getopt() method takes the argument list to be parsed (argv, analogous to $@ in bash), the string of short (single character name) options, and an optional list of long (multi-character name) options. The string of short options has a strong resemblence to the optstr in bash.

We can get the list of arguments from the command line using sys.argv. Unlike bash, this list also includes the name of the running program. For this reason, we exclude the first member of the list from sys.argv when passing it.

The getopt.getopt() method returns two lists. The first is a list of matched option - argument pairs, and the second is a list that holds any "leftover" arguments.


$ ./scriptName.py -h
opts = [('h', '')] args = []
$ ./scriptName.py -a exampleFile.csv
opts = [('a', 'exampleFile.csv')] args = []
$ ./scriptName.py -s "Glenkill" "de"
opts = [('s', 'Glennkill')] args = ['de']
                        

So in the case where the option required two arguments, I could either get the second argument fom args, or by finding it relative to the option position in the original list. I took the second approach to avoid any issues with future options that take two arguments. This is the point of


flagPos = argv.index("-s")
title = argv[flagPos+1]    # could use arg
language = argv[flagPos+2] # could use args[0]
getSuggestions(title, language)
                        

which finds the index of the -s option in argv and assumes that the next two arguments are the title and language values, in that order.

The getopts module was sufficient for what I wanted to do. I found it to be comfortably easy to use after a look at the docs. The help text was obviously not formatted out-of-box, so I wrote a simple formatter to keep words from being split over multiple lines and to make a two column format in the terminal.

Screenshot of the help printout after some simple formatting. The option names and their required arguments are in the left column, the description of what the option does is in the right column
Help message after writing a simple formatting function

Although I was content with getopt, I could not ignore that it is not actually the recommended way to handle command line interfaces in python. That honor goes to the argparse module. As such, I felt compelled to recreate what I had done with getopts using argparse.

python argparse

The inclusion of argparse in the standard library, along with the decision to deprecate its predecessor while retaining getopt, is the topic of pep 389, for the interested reader.

argparse is certainly more powerful and seems to be more 'object oriented' than getopts, but this naturally comes at the cost of complexity. A few minutes with the docs for getopts will have your program working. A few minutes with the docs for argparse... well... it's a different experience. However, there is no denying that argparse is worth the effort.

I wound up writing the code in argparse in two different ways. First I directly mimicked what I had done with getopts and took the command line arguments as options:


import argparse
import sys

def cli_parse():
parser = argparse.ArgumentParser()

make_text = "..."
parser.add_argument('-m', '--make', type=str, help=make_text,
                    metavar="SQL FILE")

add_text = "..."
parser.add_argument('-a', '--addFromFile', type=str,
                    help=add_text, metavar="FILE")

sugg_text = "..."
parser.add_argument('-s', '--suggestions', type=str, nargs=2,
                    help=sugg_text, metavar=("TITLE", "LANGUAGE"))

args = parser.parse_args()
if args.make :
    makeFunction(args.make)
if args.addFromFile:
    addFunction(args.addFromFile)
if args.suggestions:
    suggestionFunction(args.suggestions)


if __name__ = "__main__":
    cli_parse()
                        

This works, but I wasn't pleased about the if switch series. It is possible to avoid this construct when calling functions by options with argparse, but I found myself running into difficulties because I also wanted to pass arguments to the functions.

This is probably because I was abusing options when I really wanted 'sub-commands'. I did not want small adjustments to the behavior of the program (options), but rather wanted totally different behaviors depending on the command line arguments (sub-commands). Sub-commands are given without a leading dash:


$ ./someScript.py -m testFile.sq      # short option
$ ./someScript.py --make testFile.sql # long option
$ ./someScript.py make testFile.sql   # sub-command
                        

Rewriting the code using sub-commands instead of options:


import argparse
import sys

def makeFunction(args):
    print("called make function with {}".format(args.sqlfile))

def cli_parse():
    parser = argparse.ArgumentParser()
    subparser = parser.add_subparsers()

    make_text = "..."
    parser_make = subparser.add_parser("make", help=make_text)
    parser_make.add_argument("sqlFile")
    parser_make.set_defaults(func=makeFunction)

    add_text = "..."
    parser_add = subparser.add_parser("add", help=add_text)
    parser_add.add_argument("file")
    parser_add.set_defaults(func=addFunction)

    sugg_text = "..."
    parser_sugg = subparser.add_parser("suggestions", help=sugg_text)
    parser_sugg.add_argument("title")
    parser_sugg.add_argument("language")
    parser_sugg.set_defaults(func=suggestionFunction)

    args = parser.parse_args()
    args.func(args)

if __name__ = "__main__":
    cli_parse()
                        

including an explicit example of how to access the arguments in the function called.

parser.parse_args() returns a Namespace object holding attributes that are defined by the .add_argument() and .set_defaults() methods. To demonstrate, we can run the script with the 'make' subcommand and print the Namespace object assigned to args:


$ ./exampleScript.py make testFile.sql
Namespace(func=<function makeFunction at 0x7faec74c4d40>, sqlFile='testFile.sql')
called make function with testFile.sql
                        

So args.func(args) calls the function args.func, in this case the makeFunction, with args passed as an argument.

Both of these approaches generated the same nicely formatted help text. This is an advantage of argparse over getopts, no additional code was needed to produce the following:

help text printout generated automatically by the argparse module. Similar in style to a man page.
Help message text generated by argparse

The text is the original contents of the strings, which werw replaced by ellipsis in the code snippets for readability.

Parting thoughts

It was interesting to take a closer look at something I had taken for granted for so long, and to get a small glimpse of the many different approaches to taking command line arguments for python.