Send R code/R scripts/shell commands to LSF cluster

Author: Zuguang Gu ( z.gu@dkfz.de )

Date: 2021-06-30


Load the library:

library(bsub)

Note: you need to properly configure to use bsub package. Using bsub package on DKFZ ODCF cluster has already configured and is automatically loaded. For other institutes, please refer to configure_bsub_package.html.

We suggest to use bsub directly on the node that has the same file system as the computing nodes. If the file system is different from the computing nodes, you can only monitor jobs status while you cannot submit jobs.

bsub package can submit R code (by bsub_chunk()), R scripts (by bsub_script()) and bash commands (by bsub_cmd()) to the LSF cluster purely inside the R session. We suggest to save the output into permanent files in the jobs while not directly retrieving the results on the fly.

Send R code

bsub_chunk() submits the R chunk. The code chunk should be embraced by {...}. For example, NMF::nmf() normally takes very long time to run. We submit the NMF analysis to the cluster and save the results as an RDS file.

bsub_chunk(name = "example", memory = 10, hours = 10, cores = 4, 
{
    fit = NMF::nmf(...)
    # you better save `fit` into a permanent file in an absolute path
    saveRDS(fit, file = "/path/to/fit.rds")
})

In the following examples, we use Sys.sleep(5) to simulate a chunk of code which runs for a short time.

bsub_chunk(
{
    Sys.sleep(5)
})
## - job: 'R_code_c3481a47' from a code chunk
## bsub -J 'R_code_c3481a47' -W '1:0' -n 1 -R 'rusage[mem=1024]' \
##      -o '/home/guz/.bsub_temp/R_code_c3481a47.out' \
##     '/home/guz/.bsub_temp/R_code_c3481a47_3e8c3bb061c3.sh'
## [1] "6838599"

The bsub_chunk() prints the bsub command and the value returned by bsub_chunk() is the job ID from LSF cluster.

Job settings

Set job name, memory, running time and number of cores:

## - job: 'example' from a code chunk
## bsub -J 'example' -W '10:0' -n 4 -R 'rusage[mem=10240]' \
##      -o '/home/guz/.bsub_temp/example.out' \
##     '/home/guz/.bsub_temp/example_3e8c5681661.sh'
## [1] "6838600"

If name is not specified, an internal name calculated by digest::digest() on the chunk is automatically assigned. The unit of memory is GB.

Call Rscript

The R chunk is saved into a temporary R script and called by Rscript command when it is executed on the cluster. A lot of LSF clusters have customized installation of R, which means, calling Rscript is specific for every LSF cluster, thus, you need to configure how to call the Rscript command. By default, it simply calls Rscript with the default R version installed on the cluster.

To set Rscript calling with a specific version or in a specific path, you need to configure the bsub_opt$call_Rscript option. The value for bsub_opt$call_Rscript should be a user-defined function where the R version in the only argument. The default value for bsub_opt$call_Rscript is

which ignores the R version. If you want to specify Rscritp with a specific path, you can set bsub_opt$call_Rscript as:

To make it more flexible, the R version can be used when setting how to call Rscript. By default, when installing R, R will installed into folder with the version name of e.g. /.../3.6/..., thus, if there are several R versions are installed on your cluster, you can set bsub_opt$call_Rscript as:

Here qq() is from GetoptLong package which does variable interpolation. You can use similar packages such as glue here.

Later, the R version can be easily switched by setting bsub_opt$R_version or the R_version argument in bsub_chunk() (The value of R_version is sent to call_Rscript function). E.g:

Or set R_version as a global parameter:

On DKFZ ODCF cluster, software with different versions are managed by Environment Modules. bsub_opt$call_Rscript was set as follows:

The module loading for gcc/7.2.0 and java/1.8.0_131 ensures that R packages depending on specific C/Java libraries can be successfully loaded. So, if R_version is set to 4.0.0, the Rscript call would be

module load gcc/7.2.0; module load java/1.8.0_131; module load R/4.0.0; Rscript

which makes sure the Rscript from R-4.0.0 is used.

Similarlly, if you use conda for managing different versions of software, you can also choose R with different versions by setting a proper bsub_opt$call_Rscript. Let assume you have conda environments for different R versions with the name schema R_$version (e.g. R_3.6.0), then you can set bsub_opt$call_Rscript as:

Bash environment

In previous examples, we load the gcc/7.2.0 and java/1.8.0_131 modules, or activate the conda environment as parts of the command callling Rscript. These bash-level initialization can also be set by sh_head which adds shell commands as header in the bash script that is used for job submission. E.g., we can do the other way:

Or set sh_head as a global option:

One usage of this functionality is to load pandoc module if the rmarkdown is used in the code chunk (on DKFZ ODCF cluster):

Load other packages

The packages that are needed can be directly added in the code chunk:

Or assign by packages argument:

Or set it as a global parameter:

There is a special value _in_session_ for packages argument that loads all packages in the current R session.

Other R variables

The R variables that are defined outside the code chunk and need to be used inside the code chunk can by specified by variables argument:

variables argument has a special value _all_functions_ that loads all functions defined in the global environment.

R variables shared between jobs

If multiple jobs use the same variables, they can be specified via share argument. In this case, the shared variables are only saved into temporary files once. Note these temporary are not deleted automatically since they do not know whether all jobs which reply on them are finished. Users need to manually delete them when all jobs are done.

The workspace image

If you have too many external variables that are used in the code chunk or they are used in multiple jobs, you can directly save the workspace or the objects as an image and specify the image argument:

Or set the image file as a global parameter:

Absolute paths should be used instead of relative paths.

Please note, image files can be shared between different jobs and they are not deleted after all the jobs are finished, as a comparison, variables are saved into separated temporary files for different jobs even when the variable names are the same, and they are deleted after the jobs are finished.

The working directory

If the code chunk replies on the working directory, it can be specified by working_dir argument:

Or set it as a global parameter:

Note it is not recommended to let all file pathes in the jobs be relative or be affected by the working directory. It is recommended to use absolute path everywhere in the job.

Retrieve the last variable

The last variable in the code chunk can be saved by setting save_var = TRUE and retrieved back by retrieve_var() by specifying the job name. Since the variable is looked up by the job name, there should be no job with the same name submitted before retrieving the variable, or else it will only look at the newest one with the same job name.

retrieve_var() waits until the job is finished.

## - job: 'example2' from a code chunk
## bsub -J 'example2' -W '1:0' -n 1 -R 'rusage[mem=1024]' \
##      -o '/home/guz/.bsub_temp/example2.out' \
##     '/home/guz/.bsub_temp/example2_3e8c226afe7a.sh'
## [1] "6838601"
## job is running or pending, retry in 30 seconds.
## [1] 2

However, it is not recommended to directly retrieve the returned value from the code chunk. Better choice is to save the variable into permanent file in the code chunk so you don’t need to rerun the code in the future which normally has very long runing time, E.g.:

Rerun the job

There is a flag file to mark whether the job was successfully finished or not. If the job has been successfully done, the job with the same name will be skipped. enforce argument controls how to rerun the jobs with the same names. If it is set to TRUE, jobs will be rerun no matter they are done or not.

## - job: 'example' from a code chunk
## Job 'example' is already done, skip.

enforce can be set as a global parameter:

Job dependency

Since bsub_chunk() returns the job ID, it is can be used to specify the dependency in other jobs. The value for dependency can be a vector of job IDs.

Temporary and output directory

bsub_chunk() has two arguments temp_dir and output_dir. temp_dir is used for the temporary R script and sh files. output_dir is used for the flag files and the output files from LSF cluster.

They can be set as global parameters. The value of output_dir is by default set as the same as temp_dir.

To remove temporary files in temp_dir, run clear_temp_dir() function.

Run code chunk from a script

You can run code chunk from a script by specifying the starting line number and the ending line number. The R script is specified by script argument, the starting line number and the ending line number are specified by start and end arguments. (Note this functionality has not been tested yet.)

Assuming you are editing foo.R very offen and the line numbers that you want to run change from time to time, you can add tags in the R script and specifying start and end by those tags. In following example which is the source code of foo.R, we add tags for the code chunk we want to run:

Then you can specify start and end by regular expressions to match them:

Run jobs locally

Setting local = TRUE directly runs the code chunk in the same R session (do not submit to the cluster).

## - job: 'example' from a code chunk
## bash /home/guz/.bsub_temp/example_3e8c14ea7c1c.sh

Submit jobs over different parameters

The nice thing for bsub package is you can programmatically submit many of jobs. Assuming we have a list of samples where the sample IDs are saved in sample_id variable, and a list of parameters (in parameters variable) to test, we want to apply the analysis by analyze() function to each sample with each parameter per single job. We can submit all the jobs as follows:

Send R script

bsub_script() submits the job from R scripts. The major arguments are the same as in bsub_chunk().

bsub_script("/path/of/foo.R", name = ..., memory = ..., cores = ..., ...)

If the R script needs command-line arguments, they can be specified by argv.

bsub_script("/path/of/foo.R", argv = "--a 1 --b 3", ...)

When you have a list of jobs with the same argument names but with different argument values, you can construct argv by glue::glue() or GetoptLong::qq() to construct the argv string:

library(GetoptLong)
for(a in 1:10) {
    for(b in 11:20) {
        bsub_script("/path/foo.R", argv = qq("-a @{a} --b @{b}"), ...)
    }
}

The command-line arguments of your R script can also specified as arguments of bsub_script(), but with . prefix.

bsub_script("/path/foo.R", .a = 1, .b = 3, ...)

Then for the same example previously for submitting a list of jobs, it can be written as:

for(a in 1:10) {
    for(b in 11:20) {
        bsub_script("/path/foo.R", .a = a, .b = b, ...)
    }
}

The R scripts should be used in the absolute paths.

Note the bash environment can be initialized by setting the sh_head option.

Send other shell commands

bsub_cmd()submits shell commands. Basically it is similar as bsub_script():

bsub_cmd("samtools sort ...", name = ..., memory = ..., cores = ..., ...)
bsub_cmd(c("cmd1", "cmd2", ...), name = ..., memory = ..., cores = ..., ...)

The binary and the arguments should all be set in the first argument of bsub_cmd(). Remember to use glue::glue() or GetoptLong::qq() to construct the commands if they contain variable arguments, e.g:

for(bam in bam_file_list) {
    bsub_cmd(qq("samtools sort @{bam} ... "), name = qq("sort_@{basename(bam)}"), 
        memory = ..., cores = ..., ...)
}

Job Summary

bjobs() or just entering bjobs gives a summary of running and pending jobs. Job status (by default is RUN and PEND) is controlled by status argument. Number of most recent jobs is controlled by max argument. Filtering on the job name is controlled by filter argument. In the following example, we submit four tiny jobs.

for(i in 1:4) {
    bsub_chunk(name = paste0("example_", i),
    { 
        Sys.sleep(5)
    })
}
## - job: 'example_1' from a code chunk
## bsub -J 'example_1' -W '1:0' -n 1 -R 'rusage[mem=1024]' \
##      -o '/home/guz/.bsub_temp/example_1.out' \
##     '/home/guz/.bsub_temp/example_1_3e8c7f313c93.sh' 
## - job: 'example_2' from a code chunk
## bsub -J 'example_2' -W '1:0' -n 1 -R 'rusage[mem=1024]' \
##      -o '/home/guz/.bsub_temp/example_2.out' \
##     '/home/guz/.bsub_temp/example_2_3e8c1f768130.sh' 
## - job: 'example_3' from a code chunk
## bsub -J 'example_3' -W '1:0' -n 1 -R 'rusage[mem=1024]' \
##      -o '/home/guz/.bsub_temp/example_3.out' \
##     '/home/guz/.bsub_temp/example_3_3e8c27425d6e.sh' 
## - job: 'example_4' from a code chunk
## bsub -J 'example_4' -W '1:0' -n 1 -R 'rusage[mem=1024]' \
##      -o '/home/guz/.bsub_temp/example_4.out' \
##     '/home/guz/.bsub_temp/example_4_3e8c725bb088.sh'
bjobs
## ================================================================================================== 
##  JOBID   STAT JOB_NAME              SUBMIT_TIME         TIME_PASSED TIME_LEFT SLOTS MEM    MAX_MEM
##  6838393 RUN  GSE62193_smooth_chr11 2021-06-30 12:08:14 1:26        38:33     2     50.3Gb 50.3Gb 
##  6838394 RUN  GSE62193_smooth_chr12 2021-06-30 12:08:15 1:26        38:33     2     35.4Gb 53.6Gb 
##  6838604 PEND example_1             2021-06-30 13:35:02 -           -         -     -      -      
##  6838605 PEND example_2             2021-06-30 13:35:03 -           -         -     -      -      
##  6838606 PEND example_3             2021-06-30 13:35:03 -           -         -     -      -      
##  6838607 PEND example_4             2021-06-30 13:35:03 -           -         -     -      -      
## ================================================================================================== 
##  34 DONE jobs, 48 EXIT jobs, 4 PEND jobs, 2 RUN jobs within one week.
##  34 DONE jobs, 0 EXIT jobs, 4 PEND jobs, 2 RUN jobs in the last 24 hours.
##  You can have more controls by `bjobs(status = ..., max = ..., filter = ...)`.
##  Use `brecent` to retrieve recent jobs from all status.

There is one additional column RECENT in the summary table which shows the order of the jobs with the same job name. The most recent job has the value 1.

for(i in 1:2) {
    bsub_chunk(name = "example",
    { 
        Sys.sleep(5)
    })
}
## - job: 'example' from a code chunk
## bsub -J 'example' -W '1:0' -n 1 -R 'rusage[mem=1024]' \
##      -o '/home/guz/.bsub_temp/example.out' \
##     '/home/guz/.bsub_temp/example_3e8c1ababc7e.sh' 
## - job: 'example' from a code chunk
## bsub -J 'example' -W '1:0' -n 1 -R 'rusage[mem=1024]' \
##      -o '/home/guz/.bsub_temp/example.out' \
##     '/home/guz/.bsub_temp/example_3e8c547a4f87.sh'
bjobs(status = "all", filter = "example")
## ========================================================================================== 
##  JOBID   STAT JOB_NAME  RECENT SUBMIT_TIME         TIME_PASSED TIME_LEFT SLOTS MEM MAX_MEM
##  6838508 DONE example   6      2021-06-30 12:54:25 0:00        -         4     -   12Mb   
##  6838509 DONE example2  2      2021-06-30 12:54:25 0:00        -         1     -   42Mb   
##  6838512 DONE example_1 2      2021-06-30 12:55:07 0:00        -         1     -   42Mb   
##  6838514 DONE example_2 2      2021-06-30 12:55:07 0:00        -         1     -   42Mb   
##  6838515 DONE example_3 2      2021-06-30 12:55:07 0:00        -         1     -   42Mb   
##  6838516 DONE example_4 2      2021-06-30 12:55:08 0:00        -         1     -   42Mb   
##  6838517 DONE example   5      2021-06-30 12:55:08 0:00        -         1     -   42Mb   
##  6838518 DONE example   4      2021-06-30 12:55:08 0:00        -         1     -   42Mb   
##  6838600 DONE example   3      2021-06-30 13:34:20 0:00        -         4     -   37Mb   
##  6838601 DONE example2  1      2021-06-30 13:34:20 0:00        -         1     -   42Mb   
##  6838604 PEND example_1 1      2021-06-30 13:35:02 -           -         -     -   -      
##  6838605 PEND example_2 1      2021-06-30 13:35:03 -           -         -     -   -      
##  6838606 PEND example_3 1      2021-06-30 13:35:03 -           -         -     -   -      
##  6838607 PEND example_4 1      2021-06-30 13:35:03 -           -         -     -   -      
##  6838608 PEND example   2      2021-06-30 13:35:04 -           -         -     -   -      
##  6838609 PEND example   1      2021-06-30 13:35:04 -           -         -     -   -      
## ========================================================================================== 
##  34 DONE jobs, 48 EXIT jobs, 6 PEND jobs, 2 RUN jobs within one week.
##  34 DONE jobs, 0 EXIT jobs, 6 PEND jobs, 2 RUN jobs in the last 24 hours.
##  You can have more controls by `bjobs(status = ..., max = ..., filter = ...)`.
##  Use `brecent` to retrieve recent jobs from all status.

brecent() by default returns 20 most recent jobs of “all” status. You can simply type brecent without the brackets.

brecent
## ===================================================================================================== 
##  JOBID   STAT JOB_NAME             RECENT SUBMIT_TIME         TIME_PASSED TIME_LEFT SLOTS MEM MAX_MEM
##  6838405 DONE GSE62193_smooth_chrX 1      2021-06-30 12:08:18 0:27        -         2     -   53.6Gb 
##  6838406 DONE GSE62193_smooth_chrY 1      2021-06-30 12:08:18 0:21        -         2     -   53.4Gb 
##  6838507 DONE R_code_c3481a47      2      2021-06-30 12:54:24 0:00        -         1     -   37Mb   
##  6838508 DONE example              6      2021-06-30 12:54:25 0:00        -         4     -   12Mb   
##  6838509 DONE example2             2      2021-06-30 12:54:25 0:00        -         1     -   42Mb   
##  6838512 DONE example_1            2      2021-06-30 12:55:07 0:00        -         1     -   42Mb   
##  6838514 DONE example_2            2      2021-06-30 12:55:07 0:00        -         1     -   42Mb   
##  6838515 DONE example_3            2      2021-06-30 12:55:07 0:00        -         1     -   42Mb   
##  6838516 DONE example_4            2      2021-06-30 12:55:08 0:00        -         1     -   42Mb   
##  6838517 DONE example              5      2021-06-30 12:55:08 0:00        -         1     -   42Mb   
##  6838518 DONE example              4      2021-06-30 12:55:08 0:00        -         1     -   42Mb   
##  6838599 DONE R_code_c3481a47      1      2021-06-30 13:34:19 0:00        -         1     -   38Mb   
##  6838600 DONE example              3      2021-06-30 13:34:20 0:00        -         4     -   37Mb   
##  6838601 DONE example2             1      2021-06-30 13:34:20 0:00        -         1     -   42Mb   
##  6838604 PEND example_1            1      2021-06-30 13:35:02 -           -         -     -   -      
##  6838605 PEND example_2            1      2021-06-30 13:35:03 -           -         -     -   -      
##  6838606 PEND example_3            1      2021-06-30 13:35:03 -           -         -     -   -      
##  6838607 PEND example_4            1      2021-06-30 13:35:03 -           -         -     -   -      
##  6838608 PEND example              2      2021-06-30 13:35:04 -           -         -     -   -      
##  6838609 PEND example              1      2021-06-30 13:35:04 -           -         -     -   -      
## ===================================================================================================== 
##  34 DONE jobs, 48 EXIT jobs, 6 PEND jobs, 2 RUN jobs within one week.
##  34 DONE jobs, 0 EXIT jobs, 6 PEND jobs, 2 RUN jobs in the last 24 hours.
##  You can have more controls by `bjobs(status = ..., max = ..., filter = ...)`.
##  Use `brecent` to retrieve recent jobs from all status.

There are some helper functions which only list running/pending/done/failed jobs:

bjobs_barplot() makes a barplot of numbers of jobs per day.

bjobs_barplot()

bjobs_barplot

bjobs_timeline() draws the duration of each job. In the plot, each segment represents a job and the width corresponds to its duration.

bjobs_timeline()

bjobs_timeline

Other functions

Global Parameters

Type bsub_opt gives you a list of global options. Values can be set by in a form of bsub_opt$opt = value. All the values can be reset by bsub_opt(RESET = TRUE).

bsub_opt
##  Option          Value                                                                                                                  
##  ---------------:---------------------------------------------------------------
##  packages        NULL                                                                                                                   
##  image           NULL                                                                                                                   
##  temp_dir        /home/guz/.bsub_temp                                                                                                   
##  output_dir      /home/guz/.bsub_temp                                                                                                   
##  enforce         TRUE                                                                                                                   
##  R_version       4.0.0                                                                                                                  
##  working_dir     ""                                                                                                                     
##  wd              ""                                                                                                                     
##  ignore          FALSE                                                                                                                  
##  local           FALSE                                                                                                                  
##  call_Rscript    a user-defined function                                                                                                
##  submission_node odcf-worker01, odcf-cn34u03s10, odcf-cn34u03s12                                                                        
##  login_node      odcf-worker01, odcf-cn34u03s10, odcf-cn34u03s12                                                                        
##  sh_head         ""                                                                                                                     
##  user            guz                                                                                                                    
##  group           NULL                                                                                                                   
##  ssh_envir       source /etc/profile, export LSF_ENVDIR=/opt/lsf/conf, export LSF_SERVERDIR=/opt/lsf/10.1/linux3.10-glibc2.17-x86_64/etc
##  bsub_template   a user-defined function                                                                                                
##  parse_time      NULL                                                                                                                   
##  verbose         FALSE

Or a more readable text:

bconf
## Configurations for bsub:
##   * user for connecting submission node: guz
##   * submission node: odcf-worker01, odcf-cn34u03s10, odcf-cn34u03s12
##   * global R version: 4.0.0
##   * command to call `Rscript`:
##      qq("module load gcc/7.2.0; module load java/1.8.0_131; module load R/@{version}; Rscript") foo.R
##   * temporary directory: /home/guz/.bsub_temp
## 
## Configurations can be modified by `bsub_opt()` function

Interactive job monitor

Simply running monitor() opens a shiny app where you can query and manage jobs.

monitor()

Following are examples of the job monitor.

The job summary table:

monitor

Job log:

job_log

Job dependency tree:

dependency_tree

Kill jobs:

kill_jobs

Session Info

sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS:   /usr/lib64/libblas.so.3.4.2
## LAPACK: /usr/lib64/liblapack.so.3.4.2
## 
## locale:
##  [1] LC_CTYPE=C                 LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] GetoptLong_1.0.4 bsub_1.1.0       rmarkdown_2.1   
## 
## loaded via a namespace (and not attached):
##  [1] clisymbols_1.2.0    digest_0.6.27       crayon_1.3.4       
##  [4] magrittr_2.0.1      evaluate_0.14       rlang_0.4.7        
##  [7] stringi_1.5.3       GlobalOptions_0.1.2 rjson_0.2.20       
## [10] tools_4.0.0         stringr_1.4.0       xfun_0.19          
## [13] yaml_2.2.1          compiler_4.0.0      htmltools_0.5.1    
## [16] knitr_1.30