The bsub package is only broadly tested on DKFZ ODCF cluster and the default configurations work fine there. It is also possible for other clusters that use LSF system to use bsub package. The following global options should be properly configured.

If you have difficulties to configure bsub package for your cluster, please feel free to contact me. I am glad to help.

How to call Rscript command

You need to configure how to call the Rscript command, especially when you have different versions of R installed. Following is how we call Rscript on our cluster (we use module to manage the bash environment).

bsub_opt$call_Rscript = function(version) {
    qq("module load gcc/7.2.0; module load java/1.8.0_131; module load R/@{version}; Rscript")
}

Then the different version of Rscript can be switched by setting different bsub_opt$R_version option (The value of bsub_opt$R_version will be passed to the call_Rscript() function.

Similarlly, if you use conda for managing different versions of software, you can also choose R with different versions by setting a proper bsub_opt$call_Rscript. Let assume you have conda environments for different R versions with the naming schema R_$version (e.g. R_3.6.0), then you can set bsub_opt$call_Rscript as:

bsub_opt$call_Rscript = function(version) {
    qq("conda activate R_@{version}; Rscript")
}

Submission node

The names of the nodes where the jobs are submitted can be set by bsub_opt$submission_node option. The value can be a vector of node names. bsub will randomly select one to connect.

bsub_opt$submission_node = ...

Login node

If you want to submit jobs, the login node should be the same as the submission node. By default, the value for login_node is automatically copied from submission_node. But there are cases when the login node is different from the submission node. For example, If you are working on your own laptop, to connect to your institute’s cluster, you need to first connect to server A to enter your institute’s network, then on server A, you continue to connect to server B which is the submission node. In this case, server A is the login node. When you explicitly set login_node, you should also set submission_node to NULL because in this circumstance, you cannot submit jobs:

bsub_opt$login_node = "server A"
bsub_opt$submission_node = NULL

You can still query job status. Please refer to two_ssh.html to see the instructions of how to configure bsub packages when you need to establish two connections to the submission node.

Username on the submission node

Your username on the submission node can be set by bsub_opt$user. The default value is Sys.info()['user']. If the username on the submission node is different, you need to explicitly set it.

bsub_opt$user = ...

Bash environment

bsub_opt$ssh_envir should be properly set so that LSF binaries such as bsub or bjobs can be found. There are some environment variables initialized when logging in the bash terminal while they are not initialized with the ssh connection. Thus, some environment variables should be manually set.

An example for bsub_opt$ssh_envir is as follows. The LSF_ENVDIR and LSF_SERVERDIR should be defined and exported.

bsub_opt$ssh_envir = c("source /etc/profile",
                       "export LSF_ENVDIR=/opt/lsf/conf",
                       "export LSF_SERVERDIR=/opt/lsf/10.1/linux3.10-glibc2.17-x86_64/etc")

The values of these two variables can be obtained by entering following commands in your bash terminal (on the submission node):

echo $LSF_ENVDIR
echo $LSF_SERVERDIR

If you have already configured bsub_opt$user and bsub_opt$submission_node, the values of the two environment variables can also be obtains in R:

LSF_ENVDIR = ssh_exec("echo $LSF_ENVDIR")
LSF_SERVERDIR = ssh_exec("echo $LSF_SERVERDIR")
bsub_opt$ssh_envir = c("source /etc/profile",
                       paste0("export LSF_ENVDIR=", LSF_ENVDIR, "}"),
                       paste0("export LSF_SERVERDIR=", LSF_SERVERDIR, "}"))

bsub template

You need to define the template for calling the bsub command by bsub_opt$bsub_template option. The self-defined function should accepts following arguments:

  • name job name.
  • hours running time.
  • memory memory, in GB.
  • cores number of cores to use.
  • output path of output file.
  • ... should be added as the last argument of the function.

The temporary bash script to submit is automatically appended as the last argument of bsub.

E.g., the template on our cluster is defined as:

bsub_opt$bsub_template = function(name, hours, memory, cores, output, ...) {
    glue::glue("bsub -J '{name}' -W '{hours}:00' -n {cores} -R 'rusage[mem={memory}GB]' -o '{output}'")
}

You can use glue::glue() or GetoptLong::qq() to construct a complex string with interpolating multiple variables.

How to parse the time strings

The time strings by LSF bjobs command might be different for different installations. The bsub package needs to convert the time strings to POSIXlt objects for calculating the time difference in the bjobs() function. Thus, if the default time string parsing fails, users need to provide a user-defined function and set with bsub_opt$parse_time option in bsub_opt. The function accepts a vector of time strings and returns a POSIXlt object. For example, if the time string returned from bjobs command is in a form of Dec 1 18:00:00 2019, the parsing function can be defined as:

bsub_opt$parse_time = function(x) {
    as.POSIXlt(x, format = "%b %d %H:%M:%S %Y")
}

bsub package automatically recognizes several formats of time strings.