11 min read

Parse command-line arguments

There are already several R packages which parse command-line arguments such as getopt, optparse, argparse, docopt. Here GetoptLong is another command-line argument parser (actually it was developed very early. The first CRAN version was in 2013) which wraps the powerful Perl module Getopt::Long. GetoptLong package also provides some adaptations for easier use in R.

Using GetoptLong is simple especially for users having Perl experience (Oops, age exposed :)) because the specification is almost the same as in Perl. The original website of Getopt::Long is always your best reference.

The documentation of the package is at http://jokergoo.github.io/GetoptLong/articles/GetoptLong.html. Here this post is a simplified version.

The GetoptLong package has not been udpated to CRAN yet, so you need to install it from GitHub:

devtools::install_github("jokergoo/GetoptLong")

A quick example

Specify as a vector

The following example gives you some feels of using GetoptLong package. The following code is saved in to an R script named foo.R.

library(GetoptLong)

cutoff = 0.05
GetoptLong(
    "number=i", "Number of items.",
    "cutoff=f", "Cutoff for filtering results.",
    "verbose",  "Print message."
)

The R script can be executed as:

~\> Rscript foo.R --number 4 --cutoff 0.01 --verbose
~\> Rscript foo.R --number=4 --cutoff=0.01 --verbose
~\> Rscript foo.R -n 4 -c 0.01 -v
~\> Rscript foo.R -n 4 --verbose

In this example, number is a mandatory option and it should only be in integer mode (has a tab i). cutoff is numeric (tag f) and optional and it already has a default value 0.05. verbose is a logical option. If parsing is successful, two variables number and verbose will be imported into the working environment with the specified values. Value for cutoff will be updated if it is specified in command-line.

Data types are automatically checked. E.g., if cutoff is specified with a character, an error will be thrown.

The option usage triggered by --help is automatically generated. There are two styles:

The one-column style:

Usage: Rscript foo.R [options]

Options:
  --number, -n integer
    Number of items.
 
  --cutoff, -c numeric
    Cutoff for filtering results.
    [default: 0.05]
 
  --verbose
    Print message.
 
  --help, -h
    Print help message and exit.
 
  --version
    Print version information and exit.
 

Or the two-column style:

Usage: Rscript foo.R [options]

Options:
  --number, -n     Number of items.
    [type: int] 
  --cutoff, -c     Cutoff for filtering results.                 
    [type: num]    [default: 0.05] 
  --verbose        Print message. 
  --help, -h       Print help message and exit. 
  --version        Print version information and exit. 

You can find the short option names (in single letters, e.g., -n, -c, -h) are automatically added. The information of default values is added as well (e.g., [default: 0.05] for cutoff option).

Specify as a template

The specification can also be set as a template where the specifications are marked by <>.

library(GetoptLong)
spec = "
This is an example of using template to specify options.

Usage: Rscript foo.R [options]

Options:
  <number=i> Number of items.
  <cutoff=f> Cutoff for filtering results.               
  <verbose> Print messages.

Contact: name@address
"

GetoptLong(spec, template_control = list(opt_width = 23))

The parameter opt_width controls the maximal width of the option description (i.e., --number, -n integer, --cutoff, -c numeric and --verbose).

Calling Rscript foo.R --help generates the following message:

This is an example of using template to specify options.

Usage: Rscript foo.R [options]

Options:
  --number, -n integer    Number of items.
  --cutoff, -c numeric    Cutoff for filtering results.               
  --verbose               Print messages.

Contact: name@address 

Advantages

There are several advantages compared to other command-line argument parser packages. The major advantage comes from the Getopt::Long Perl module which actually parses the options. The Getopt::Long module provides a flexible, smart and compact way for specifying command-line arguments. The major features are:

  1. Various formats of specifying options with values, such as
-s 24 -s24

or

--size 24  --size=24 -size 24  -size=24
  1. Single-letter options can be bundled:
-a -b -c  -abs
  1. Options with multiple names. With the following specification, --length, --height are the same.
length|height=f
  1. Automatically support single-letter options. If the first letter of an option is unique to all other options, the first letter can be used as an optional option name. For example, if l and h are unique, --length, --height, -l and -h set the same option.
length|height=f  --length --height -l -h
  1. Rich option data types, including scalar, vector (array in Perl), list (hash in Perl). For example:
length=i     a single integer scalar
name=s       a single character scalar

can be specified as:

--length 1 --name a

or

length=i@       a integer vector
name=s@         a character vector
length=i{2,}    a integer vector, at least two elements
name=s{2,}      a character vector, at least two elements

can be specified as:

--length 1 2 3 --name a b c

or

length=i%    name-value pair, values should be integers
name=s%      name-value pair, values should be characters

to be specified as:

--length foo=1 bar=3 --name foo=a bar=b

The features from R part are:

  1. It automaticlly generates option usage in two styles. The data type and default value of options are automatically detected and included.

  2. It supports specifying the usage in a template which allows more complex text of option usage.

  3. It allows grouping options.

  4. It provides a natural and convenient way to specify defaults.

Help option

Option usage is automatically generated and can be retrieved by setting --help in the command. In following example, I create an option specification that contains all types of options (with long descriptions):

library(GetoptLong)
GetoptLong(
    "count=i",  paste("This is a count. This is a count. This is a count.",
                      "This is a count.  This is a count. This is a count."),
    "number=f", paste("This is a number. This is a number. This is a number.",
                      "This is a number. This is a number. This is a number."),
    "array=f@", paste("This is an array. This is an array. This is an array.",
                      "This is an array. This is an array. This is an array."),
    "hash=s%",  paste("This is a hash. This is a hash. This is a hash.",
                      "This is a hash. This is a hash. This is a hash."),
    "verbose!", "Whether show messages",
    "flag",     "a non-sense option"
)

The option usage is as follows. Here, for example, the single-letter option -c for --count is automatically extracted while not for --help because h matches two options.

Usage: Rscript foo.R [options]

Options:
  --count, -c integer
    This is a count. This is a count. This is a count. This is a count.  This is
    a count. This is a count.
 
  --number, -n numeric
    This is a number. This is a number. This is a number. This is a number. This
    is a number. This is a number.
 
  --array, -a [numeric, ...]
    This is an array. This is an array. This is an array. This is an array. This
    is an array. This is an array.
 
  --hash {name=character, ...}
    This is a hash. This is a hash. This is a hash. This is a hash. This is a
    hash. This is a hash.
 
  --verbose, -no-verbose
    Whether show messages
    [default: off]
 
  --flag, -f
    a non-sense option
 
  --help
    Print help message and exit.
 
  --version
    Print version information and exit.
 

If default values for options are provided, they are properly inserted to the usage message.

library(GetoptLong)
count = 1
number = 0.1
array = c(1, 2)
hash = list("foo" = "a", "bar" = "b")
verbose = TRUE

GetoptLong(
  ...
)
Usage: Rscript foo.R [options]

Options:
  --count, -c integer
    This is a count. This is a count. This is a count. This is a count.  This is
    a count. This is a count.
    [default: 1]
 
  --number, -n numeric
    This is a number. This is a number. This is a number. This is a number. This
    is a number. This is a number.
    [default: 0.1]
 
  --array, -a [numeric, ...]
    This is an array. This is an array. This is an array. This is an array. This
    is an array. This is an array.
    [default: 1, 2]
 
  --hash {name=character, ...}
    This is a hash. This is a hash. This is a hash. This is a hash. This is a
    hash. This is a hash.
    [default: foo=a, bar=b]
 
  --verbose, -no-verbose
    Whether show messages
    [default: on]
 
  --flag, -f
    a non-sense option
 
  --help
    Print help message and exit.
 
  --version
    Print version information and exit.
 

The global parameters help_style can be set to two-column to change to another style:

library(GetoptLong)
GetoptLong.options(help_style = "two-column")
# specifying the defaults
...

GetoptLong{
    ...
}
Usage: Rscript foo.R [options]

Options:
  --count, -c                  This is a count. This is a count. This is a count.
    [type: int]                This is a count.  This is a count. This is a count.                             
                               [default: 1] 
  --number, -n                 This is a number. This is a number. This is a
    [type: num]                number. This is a number. This is a number. This is
                               a number.                             
                               [default: 0.1] 
  --array, -a                  This is an array. This is an array. This is an
    [type: [num, ...]]         array. This is an array. This is an array. This is
                               an array.                             
                               [default: 1, 2] 
  --hash                       This is a hash. This is a hash. This is a hash. This
    [type: {name=chr, ...}]    is a hash. This is a hash. This is a hash.                             
                               [default: foo=a, bar=b] 
  --verbose, -no-verbose       Whether show messages                             
                               [default: on] 
  --flag, -f                   a non-sense option 
  --help                       Print help message and exit. 
  --version                    Print version information and exit. 

The options in the usage text can be grouped by setting separator lines. The separator line should contain two elements: the separator and the description. The separator can be any character in -+=#% with any length.

library(GetoptLong)
count = 1
array = c(0.1, 0.2)
GetoptLong(
    "--------", "Binary options:",
    "verbose!", "Whether show messages",
    "flag",     "a non-sense option",

    "-------", "Single-value options:",
    "count=i",  paste("This is a count. This is a count. This is a count.",
                      "This is a count.  This is a count. This is a count."),
    "number=f", paste("This is a number. This is a number. This is a number.",
                      "This is a number. This is a number. This is a number."),
    
    "--------", paste("Multiple-vlaue options: long text long text long text",
                      " long text long text long text long text long text"),
    "array=f@", paste("This is an array. This is an array. This is an array.",
                      "This is an array. This is an array. This is an array."),
    "hash=s%",  paste("This is a hash. This is a hash. This is a hash.",
                      "This is a hash. This is a hash. This is a hash."),
    
    "-------", "Other options:"
)
Usage: Rscript foo.R [options]

Binary options:
  --verbose, -no-verbose
    Whether show messages
    [default: off]
 
  --flag, -f
    a non-sense option
 
Single-value options:
  --count, -c integer
    This is a count. This is a count. This is a count. This is a count.  This is
    a count. This is a count.
    [default: 1]
 
  --number, -n numeric
    This is a number. This is a number. This is a number. This is a number. This
    is a number. This is a number.
 
Multiple-vlaue options: long text long text long text long text long text long
text long text long text
  --array, -a [numeric, ...]
    This is an array. This is an array. This is an array. This is an array. This
    is an array. This is an array.
    [default: 0.1, 0.2]
 
  --hash {name=character, ...}
    This is a hash. This is a hash. This is a hash. This is a hash. This is a
    hash. This is a hash.
 
Other options:
  --help
    Print help message and exit.
 
  --version
    Print version information and exit.
 

And the two-column style for the grouped options.

library(GetoptLong)
GetoptLong.options(help_style = "two-column")
GetoptLong{
    ...
}
Usage: Rscript foo.R [options]

Binary options:
  --verbose, -no-verbose    Whether show messages                          
                            [default: off] 
  --flag, -f                a non-sense option 

Single-value options:
  --count, -c      This is a count. This is a count. This is a count. This is a
    [type: int]    count.  This is a count. This is a count.                 
                   [default: 1] 
  --number, -n     This is a number. This is a number. This is a number. This is
    [type: num]    a number. This is a number. This is a number. 

Multiple-vlaue options: long text long text long text long text long text long
text long text long text
  --array, -a                  This is an array. This is an array. This is an
    [type: [num, ...]]         array. This is an array. This is an array. This is
                               an array.                             
                               [default: 0.1, 0.2] 
  --hash                       This is a hash. This is a hash. This is a hash. This
    [type: {name=chr, ...}]    is a hash. This is a hash. This is a hash. 

Other options:
  --help       Print help message and exit. 
  --version    Print version information and exit. 

For more detailed explanationa, please go to the web site of GetoptLong package.

Session info

sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GetoptLong_1.0.0 knitr_1.28      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6        bookdown_0.19       digest_0.6.25      
 [4] crayon_1.3.4        magrittr_1.5        evaluate_0.14      
 [7] blogdown_0.19       GlobalOptions_0.1.3 rlang_0.4.6        
[10] stringi_1.4.6       rmarkdown_2.1       rjson_0.2.20       
[13] tools_4.0.0         stringr_1.4.0       xfun_0.14          
[16] yaml_2.2.1          compiler_4.0.0      htmltools_0.4.0