Biowulf High Performance Computing at the NIH
R/Bioconductor on Biowulf
R logo

R is a language and environment for statistical computing and graphics. It can be considered an open source decendant of the S language which was developed by Chambers and colleagues at Bell Laboratories in the 1970s.

R is highly extensible and provides a wide variety of modern statistical analysis methods combined with excellent graphical visualization capabilities embedded in a programming language that supports procedural, functuional, and object oriented programming styles. R natively provides operators for calculations on arrays and matrices.

While individual single threaded R code is not expected to run any faster in an interactive session on a compute node than it would run on a modern desktop, Biowulf allows users to run many R jobs concurrently or take advantage of the speedup provided by parallelizing R code to a much greater degree than possible on a single workstation.

On biowulf, R modules are available for the minor releases (e.g. 3.5) which will contain the newest patch level releases (e.g. 3.5.3).

Changelog

June 2018: Cluster update from RHEL6 to RHEL7

Common pitfalls
Implicit multithreading
R can make use of implicit multithreading via two different mechanisms. One of them is regulated by the OMP_NUM_THREADS environment variable which is set to 1 by the R modules because leaving this variable unset can lead to R using as many threads as there are CPUs on a compute node thus overloading jobs. If you know your code can make effective use of those threads you can explicitly set OMP_NUM_THREADS to greated than 1 after loading the module. However, only a subset of code will be able to take advantage of this - don't expect an automatic speed increase.
parallel::detectCores() always detects all CPUs on a node
R using one of the parallel packages (parallel, doParallel, ...) often overload their job allocation because they are using the detectCores() function from the parallel package to determine how many worker processes to use. However, this function returns the number of physical CPUs on a compute node irrespective of how many have been allocated to a job. Therefore, if not all CPUs are allocated to a job the job will be overloaded and perform poorly. See the section on the parallel package for more detail.
BiocParallel by default tries to use most CPUs on a node
BiocParallel is not aware of slurm and by default tries to use most of the CPUs on a node irrespetive of the slurm allocation. This can lead to overloaded jobs. See the section on the BiocParallel package for more information on how to avoid this.
Poor scaling of parallel code
Don't assume that you should allocate as many CPUs as possible to a parallel workload. Parallel efficiency often drops and in some cases allocating more CPUs may actually extend runtimes. If you use/implement parallel algorithms please measure scaling before submitting large numbers of such jobs.

R will automatically use lscratch for temporary files if it has been allocated. Therefore we highly recommend users always allocate a minimal amount of lscratch of 1GB plus whatever lscratch storage is required by your code.

Interactive R

Allocate an interactive session for interactive R work. Note that R sessions are not allowed on the login node nor helix.

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

There may be multiple versions of R available. An easy way of selecting the version is to use modules.To see the modules available, type

[user@cn3144 ~]$ module -r avail '^R$'

--------------- /usr/local/lmod/modulefiles ------------------
   R/3.5

Set up your environment and start up an R session

[user@cn3144 ~]$ module load R/3.5
[user@cn3144 ~]$ R
R version 3.5.0 (2018-04-23) -- "Joy in Playing"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 2.2.1     ✔ purrr   0.2.4
✔ tibble  1.4.2     ✔ dplyr   0.7.4
✔ tidyr   0.8.0     ✔ stringr 1.3.0
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
> [...lots of work...]
> q()
Save workspace image? [y/n/c]: n

A rudimentary graphical interface is available if the sinteractive session was started from a session with X11 forwarding enabled:

[user@cn3144 ~]$ R --gui=Tk

However, RStudio is a much better interface with many advanced features.

Don't forget to exit the interactive session

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Installed packages

Packages installed in the current default R environment

PackageVersion
translations3.5.0
spatial7.3-11
viridisLite0.3.0
urltools1.7.1
trapezoid2.0-0
polynom1.3-9
plotly4.8.0
PheWAS0.12.1
org.Mm.eg.db3.6.0
oligoClasses1.42.0
numDeriv2016.8-1
methylumi2.26.0
mcmc0.9-5
matrixcalc1.0-3
latticeExtra0.6-28
ipred0.9-8
groHMM1.14.0
GOSemSim2.6.2
geosphere1.5-7
DESeq21.20.0
DEoptimR1.0-8
DBChIP1.24.0
cn.mops1.26.0
BSgenome.Hsapiens.UCSC.hg191.4.0
BPSC0.99.1
AnnotationForge1.22.2
00LOCK-RcppCore-Rcpp-a669a19unknown
TxDb.Celegans.UCSC.ce6.ensGene3.2.2
strucchange1.5-1
stabledist0.7-1
sensitivity1.15.2
rmarkdown1.10
ProtGenerics1.12.0
pbatR2.2-13
msm1.6.6
MLEcens0.1-4
magic1.5-9
JavaGD0.6-1
GlobalOptions0.1.0
geiger2.0.6
episensr0.9.2
ENmix1.16.0
distillery1.0-4
compositions1.40-2
colorspace1.3-2
caTools1.17.1.1
BSgenome.Scerevisiae.UCSC.sacCer11.4.0
BSgenome.Hsapiens.UCSC.hg181.3.1000
bold0.8.0
blob1.1.1
ALL1.22.0
ADGofTest0.3
splines3.5.0
00LOCK-MASSunknown
statmod1.4.30
spliceR1.22.0
rcmdcheck1.3.1
rae230aprobe2.18.0
pvca1.20.0
proto1.0.0
ks1.11.3
keras2.2.0
jsonlite1.5
httr1.3.1
Heatplus2.26.0
foreach1.4.4
forcats0.3.0
fgsea1.6.0
farver1.0
FactoMineR1.41
expm0.999-3
data.table1.11.8
Cubist0.2.2
CNEr1.16.1
clusterRepro0.9
BSgenome.Scerevisiae.UCSC.sacCer21.4.0
bibtex0.4.2
xfun0.4
subSeq1.10.0
spam2.2-0
snowFT1.6-0
NOISeq2.24.0
NMF0.21.0
Maaslin0.0.5
logicFS1.50.0
Lmoments1.2-3
LGEWIS1.1
IsoformSwitchAnalyzeR1.2.0
interval1.1-0.1
fit.models0.5-14
ExomeDepth1.1.10
DBI1.0.0
ChIPpeakAnno3.14.2
CGEN3.16.0
BiocGenerics0.26.0
bindata0.9-19
beadarray2.30.0
codetools0.2-15
sna2.4
sgeostat1.0-27
RRPP0.3.0
R.methodsS31.7.1
remoter0.4-0
PhysicalActivity0.2-2
pheatmap1.0.10
multicool0.1-10
MplusAutomation0.7-2
lmerTest3.0-1
lazyeval0.2.1
lars1.2
Haplin7.0.0
DaMiRseq1.4.2
commonmark1.6
ade41.7-13
parallel3.5.0
TCGA2STAT1.2
SVGAnnotation0.93-2
remotes2.0.2
quantreg5.36
plyr1.8.4
pbs1.1
org.Hs.eg.db3.6.0
OmicPath0.1
natserv0.1.4
naivebayes0.9.2
monocle2.8.0
maps3.3.0
lmtest0.9-36
interactiveDisplayBase1.18.0
flux0.3-0
flexmix2.3-14
DAAG1.22
CODEX21.3.0
bindr0.1.1
Bhat0.9-10
bayesplot1.6.0
accelerometry3.1.2
WikidataR1.4.0
vipor0.4.5
TxDb.Rnorvegicus.UCSC.rn4.ensGene3.2.2
tkWidgets1.58.0
spls2.2-2
shinyFiles0.7.1
RInside0.2.14
refund0.1-17
readr1.1.1
reactome.db1.64.0
rainbow3.5
QuantPsyc1.5
PMA1.0.11
packcircles0.3.3
libcoin1.0-1
IRkernel0.8.12.9000
DESeq1.32.0
cn.farms1.28.0
cli1.0.1
biclust2.0.1
stats43.5.0
base3.5.0
tclust1.4-1
synthpop1.5-0
SNPlocs.Hsapiens.dbSNP142.GRCh370.99.5
SNPchip2.26.0
sleuth0.29.0
shinydashboard0.7.1
sampling2.8
S4Vectors0.18.3
RMTstat0.3
raster2.8-4
pls2.7-0
oz1.0-21
ordinal2018.8-25
miniUI0.1.1.1
meta4.9-2
gsubfn0.7
GOstats2.46.0
ggpubr0.1.8
genoset1.36.0
gdata2.18.0
future.apply1.0.1
edgeR3.22.5
dfoptim2018.2-1
ChIPseeker1.16.1
bitops1.0-6
affycomp1.56.0
class7.3-14
WriteXLS4.0.0
waveslim1.7.5
TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2
tsna0.2.0
thgenetics0.4-2
tensorA0.36.1
spdep0.7-9
sfit0.3.1
RNifti0.10.0
RMediation1.1.4
readxl1.1.0
RcppRoll0.3.0
RcppProgress0.4.1
RcmdrPlugin.IPSUR0.2-1
pROC1.13.0
PBSmodelling2.68.6
partykit1.2-2
GPArotation2014.11-1
ggeffects0.6.0
GenABEL1.8-0
emmeans1.3.0
DMRcatedata1.16.0
copynumber1.20.0
cmprsk2.2-7
bbmle1.0.20
BatchJobs1.7
batch1.1-5
additivityTests1.1-4
UpSetR1.3.3
timsac1.3.6
tergm3.5.2
svd0.4.1
subplex1.5-4
sROC0.1-2
sparsesvd0.1-4
snow0.4-3
rversions1.0.3
RcmdrPlugin.epack1.2.5
plotmo3.5.0
plgem1.52.0
pd.genomewidesnp.63.14.1
parathyroidSE1.18.0
manhattanly0.2.0
maanova1.50.0
leafcutter0.2.7
irr0.84
IRdisplay0.6
gmodels2.18.1
Exact1.7
DRIMSeq1.8.0
diffusionMap1.1-0.1
bit1.1-14
zlibbioc1.26.0
withr2.1.2
tidyr0.8.2
spatstat.data1.4-0
RWiener1.3-1
RSQLite2.1.1
RandomFields3.1.50
qvcalc0.9-1
perm1.0-0.0
pbdMPI0.3-6
oro.nifti0.9.1
MetaSKAT0.60
lava1.6.3
intervals0.15.1
haploR2.0.6
EBSeq1.20.0
e10711.7-0
zebrafishcdf2.18.0
TxDb.Mmusculus.UCSC.mm9.knownGene3.2.2
TeachingDemos2.10
spp1.15.5
SARTools1.6.2
R.matlab3.6.2
R2WinBUGS2.1-21
quantmod0.4-13
pastecs1.3.21
nplplot4.5
markdown0.8
hdf5r1.0.1
glmpath0.98
ggm2.3
fansi0.4.0
extRemes2.0-9
clue0.3-56
CHsharp0.4
chipseq1.30.0
tsne0.1-3
tripack1.3-8
timeSeries3042.102
sva3.28.0
SummarizedExperiment1.10.1
peer1.0
pec2018.07.26
orthogonalsplinebasis0.1.6
modeltools0.2-22
maxstat0.7-25
MatchIt3.0.2
ICsurv1.0
graph1.58.2
glmmML1.0.3
ggthemes4.0.1
ggdendro0.1-20
GenomicAlignments1.16.0
genetics1.3.8.1
fail1.3
enpls6.0
DNAcopy1.54.0
discretization1.0-1
colorRamps2.3
colorfulVennPlot2.4
brew1.0-6
Zelig5.1.6
vsn3.48.1
tiff0.1-5
survminer0.4.3
survJamda.data1.0.2
sQTLseekeR2.1
rlecuyer0.3-4
relimp1.0-5
RcppNumerical0.3-2
rafalib1.0.0
pd.hugene.2.0.st3.14.1
pan1.6
mygene1.16.2
mmap0.6-17
listenv0.7.0
gh1.0.1
geonames0.998
FSelector0.31
fracdiff1.4-2
EGSEA1.8.0
DynDoc1.58.0
dnet1.1.4
denstrip1.5.4
CpGassoc2.60
ConsensusClusterPlus1.44.0
CODEX1.12.0
beachmat1.2.1
tcltk3.5.0
KernSmooth2.23-15
wateRmelon1.24.0
ROTS1.8.0
questionr0.6.3
prabclus2.2-6
pkgload1.0.2
pkgbuild1.0.2
orthopolynom1.0-5
NADA1.6-1
logcondens2.1.5
KEGGgraph1.40.0
JADE2.0-1
IHW1.8.0
hpgltools2018.03
gsl1.9-10.3
GPA1.1-0
GenVisR1.12.1
gdtools0.1.7
gamm40.2-5
fdrtool1.2.15
epiR0.9-99
dygraphs1.1.1.6
dmrseq1.0.14
blockmodeling0.3.1
utils3.5.0
graphics3.5.0
zCompositions1.1.2
vioplot0.2
tidyverse1.2.1
systemPipeR1.14.0
ruv0.9.7
robust0.4-18
Rfit0.23.0
RefPlus1.50.0
rbenchmark1.0.0
moments0.14
minet3.38.0
locfit1.5-9.1
inline0.3.15
idr1.2
hgu95av22.2.0
hgu133plus2probe2.18.0
getopt1.20.2
geomorph3.0.7
falcon0.2
evd2.3-3
enrichR1.0
dropbead0.3.1
diptest0.75-7
dimRed0.1.0
dendextend1.9.0
ChromHeatMap1.34.0
chopsticks1.46.0
CCA1.2
MASS7.3-50
grDevices3.5.0
xopen1.0.0
truncnorm1.0-8
tidyselect0.2.5
systemfit1.1-22
seqLogo1.46.0
rvest0.3.2
robustbase0.93-3
rjson0.2.20
randomForestSRC2.7.0
praise1.0.0
plot3D1.1.1
optparse1.6.0
mclust5.4.1
manipulateWidget0.10.0
logistf1.23
KMsurv0.1-5
INPower1.16.0
HilbertVis1.38.0
hash2.2.6
gyriq1.0.2
ggrepel0.8.0
ggbio1.28.5
geometry0.3-6
EMCluster0.2-10
DMRcate1.16.0
coxphf1.13
corpcor1.6.9
ClusterR1.1.5
brms2.6.0
bridgesampling0.6-0
ASSET1.99.0
spData0.2.9.4
SKAT1.3.2.1
shinystan2.5.0
scran1.8.4
rotl3.0.5
rex1.1.2
RCircos1.2.0
psych1.8.10
progress1.2.0
PBSmapping2.70.5
mnormt1.5-5
miscTools0.6-22
MassSpecWavelet1.46.0
LSD4.0-0
kernlab0.9-27
iCluster2.1.0
HIBAG1.16.0
HI0.4
h2o3.20.0.8
gss2.1-9
grpreg3.2-0
gbRd0.4-11
epifit0.1.2
clisymbols1.2.0
cellrangerRkit2.0.0
baySeq2.14.0
airway0.114.0
TH.data1.0-9
survival2.43-1
supraHex1.18.0
SPAtest3.0.0
space0.1-1
Seurat2.3.4
rtracklayer1.40.6
Rmisc1.5
reprex0.2.1
ncbit2013.03.29
micEcon0.6-14
KEGG.db3.2.3
hexbin1.27.2
callr3.0.0
bsseq1.16.1
apeglm1.2.1
XML3.98-1.16
vows0.5
ReactomePA1.24.0
pctGCdata0.2.0
outliers0.14
nnls1.4
mlr2.13
mlmRev1.0-6
maptools0.9-4
inum1.0-0
illuminaio0.22.0
hopach2.40.0
exomeCopy1.26.0
esATAC1.2.3
densityClust0.3
crlmm1.38.0
cobs1.3-3
bit640.9-7
annaffy1.52.0
sequenza2.1.2
rpf0.59
RJSONIO1.3-0
rGADEM2.28.0
RcmdrPlugin.TeachingDemos1.1-0
RcmdrMisc2.5-1
Rcmdr2.5-1
plier1.50.0
pbdDMAT0.4-2
oligo1.44.0
mvbutils2.7.4.1
LDheatmap0.99-5
irlba2.3.2
gtools3.8.1
git2r0.23.0
gage2.30.0
Epi2.32
eha2.6.0
easyRNASeq2.16.0
clipr0.4.1
cghMCR1.38.0
Biostrings2.48.0
ada2.0-5
zip1.0.0
yaml2.2.0
xlsx0.6.1
threejs0.3.1
speedglm0.3-2
spacetime1.2-2
Rpairix0.3.6
Ringo1.44.0
Rdpack0.10-1
rae230acdf2.18.0
networkD30.4
MEGENA1.3.7
genefilter1.62.0
epitools0.5-10
ensemblVEP1.22.1
EDASeq2.14.1
dynamicTreeCut1.63-1
DropletUtils1.0.3
aroma.core3.1.3
cluster2.0.7-1
snpStats1.30.0
sfsmisc1.1-2
semTools0.5-1
riskRegression2018.10.03
Rhtslib1.12.1
R.cache0.13.0
qap0.1-1
pathview1.20.0
pander0.6.3
optmatch0.9-10
optextras2016-8.8
motifmatchr1.2.0
ipw1.0-11
ini0.3.1
frma1.32.0
ExperimentHub1.6.1
DDRTree0.1.5
cqn1.26.0
contrast0.21
clonevol0.99.11
VGAM1.0-6
vcd1.4-4
SuppDists1.1-9.4
supclust1.0-7
SNPlocs.Hsapiens.dbSNP.201206080.99.11
slam0.1-43
shinythemes1.1.2
R.rsp0.43.0
rprojroot1.3-2
Rook1.1-1
rbugs0.5-9
pwr1.2-2
PSCBS0.64.0
prediction0.3.6
pixmap0.4-11
nycflights131.0.0
MotifDb1.22.0
MAST1.6.1
mapproj1.2.6
lsmeans2.30-0
Kendall2.2
HardyWeinberg1.6.1
golubEsets1.22.0
glmnet2.0-16
futile.logger1.4.3
ecodist2.0.1
EBarrays2.44.0
devEMF3.6-1
argon20.2-0
boot1.3-20
XLConnectJars0.2-15
TFMPvalue0.0.8
ROC1.56.0
RCurl1.95-4.11
randtests1.0
mipfp3.2.1
HTqPCR1.34.0
hoardr0.5.0
haven1.1.2
gRbase1.8-3.4
GeneRegionScan1.36.0
fmsb0.6.3
fastcluster1.1.25
DT0.5
doBy4.6-2
ComplexHeatmap1.18.1
00LOCK-rstanunknown
verification1.42
tinytex0.9
spatstat.utils1.13-0
rootSolve1.7
RandomFieldsUtils0.3.25
qtl1.42-8
plm1.6-6
pd.hg.u133.plus.23.12.0
pbkrtest0.4-7
PADOG1.22.0
iterators1.0.10
ica1.0-2
hugene20sttranscriptcluster.db8.7.0
httpuv1.4.5
ggfortify0.4.5
GetoptLong0.1.7
fission0.114.0
fibroEset1.22.0
emdbook1.3.10
drc3.0-1
dichromat2.0-0
Biobase2.40.0
bigmemory.sri0.1.3
assertthat0.2.0
00LOCK-foreignunknown
udunits20.13
selectr0.4-1
ROCR1.0-7
rJava0.9-10
permute0.9-4
pcaMethods1.72.0
pbdZMQ0.3-3
pasilla1.8.0
openssl1.0.2
npsurv0.4-0
np0.60-9
mvmeta0.4.11
HTMLUtils0.1.7
hthgu133aprobe2.18.0
Gviz1.24.0
ggbeeswarm0.6.0
FDb.InfiniumMethylation.hg192.2.0
dvmisc1.1.2
directlabels2018.05.22
deconstructSigs1.8.0
csaw1.14.1
config0.3
colourpicker1.0
car3.0-2
brglm0.6.1
ballgown2.12.0
AnnotationDbi1.42.1
animation2.5
datasets3.5.0
wordcloud2.6
SDMTools1.1-221
sandwich2.5-0
pspline1.0-18
methylKit1.6.3
memoise1.1.0
isva1.9
isotone1.1-0
IlluminaHumanMethylation450kmanifest0.4.0
FDb.InfiniumMethylation.hg182.2.0
DOSE3.7.0
debugme1.1.0
corrgram1.13
ChIPsim1.34.0
c3net1.1.1
BiocInstaller1.30.0
bio3d2.3-4
base64enc0.1-3
awsMethods1.1-0
akima0.6-2
admixturegraph1.0.2
zoo1.8-4
triangle0.11
tis1.34
splancs2.01-40
specificity0.1.1
signal0.7-6
RWekajars3.9.2-1
rngtools1.3.1
RMySQL0.10.15
Rcgmin2013-2.21
pd.mogene.1.0.st.v13.14.1
mvQuad1.0-6
mouse4302.db3.2.3
mcmcplots0.4.3
marray1.58.0
graphite1.26.3
GGally1.4.0
deldir0.1-15
bmm0.3.1
ape5.2
actuar2.3-1
whisker0.3-2
variancePartition1.10.4
utf81.1.4
TxDb.Hsapiens.UCSC.hg18.knownGene3.2.2
tweenr1.0.0
rncl0.8.3
random0.2.6
prettyunits1.0.2
poweRlaw0.70.1
openxlsx4.1.0
NORMT31.0-3
MatrixEQTL2.2
hthgu133acdf2.18.0
GSVA1.29.2
ggraph1.0.2
gamlss.data5.1-0
flashClust1.01-2
ergm3.9.4
downloader0.4
bayesm3.1-0.1
AnnotationFilter1.4.0
TitanCNA1.18.0
Sushi1.18.0
sqldf0.4-11
seriation1.2-3
rstan2.18.1
rphast1.6.9
ResourceSelection0.3-2
mice3.3.0
lubridate1.7.4
loo2.0.0
lme41.1-18-1
GO.db3.6.0
genbankr1.8.0
geepack1.2-1
gaia2.24.0
Formula1.2-3
DSS2.28.0
dr3.0.10
DiceKriging1.5.6
cummeRbund2.22.0
catmap1.6.4
testthat2.0.1
splitstackshape1.4.6
safe3.20.0
RnBeads1.12.1
rglwidget0.2.1
readstata130.9.2
RcppEigen0.3.3.4.0
pryr0.1.4
phyloseq1.24.2
optimx2018-7.10
modelr0.1.2
MLInterfaces1.60.1
MiST1.0
ISwR2.0-7
ineq0.2-13
hu6800probe2.18.0
glue1.3.0
glmm1.2.3
GenomicFeatures1.32.3
FREGAT1.1.0
doMC1.3.5
diffloop1.8.0
diffHic1.12.1
dendsort0.3.3
dbplyr1.2.2
coin1.2-2
checkmate1.8.5
calibrate1.7.2
BSgenome.Mmusculus.UCSC.mm91.4.0
binom1.1-1
stringi1.2.4
RUnit0.4.32
rrcov1.4-4
RPMM1.25
RIPSeeker1.20.0
rARPACK0.11-0
R62.3.0
qvalue2.12.0
plogr0.2.0
multidplyr0.0.0.9000
laeken0.4.6
klaR0.6-14
infotheo1.2.0
hdrcde3.2
glmmTMB0.2.2.0
ggmap2.6.1
genalg0.2.0
expint0.1-5
dynlm0.3-5
DiffBind2.8.0
DEXSeq1.26.0
DescTools0.99.26
Deriv3.8.5
crul0.6.0
cowplot0.9.3
ars0.6
AlgDesign1.1-7.3
stats3.5.0
uuid0.1-2
uroot2.0-9
TxDb.Dmelanogaster.UCSC.dm3.ensGene3.2.2
tmvtnorm1.4-10
TMB1.7.14
rhdf52.24.0
regioneR1.12.0
PureCN1.10.0
pbivnorm0.6.0
nloptr1.2.1
networkDynamic0.9.0
leaps3.0
gpclib1.5-5
globaltest5.34.1
gam1.16
fs1.2.6
FGN2.0-12
ffpe1.24.0
fastseg1.26.0
DO.db2.9
DirichletMultinomial1.22.0
agricolae1.2-8
StanHeaders2.18.0
sjlabelled1.0.14
rio0.5.10
pscl1.5.2
numbers0.7-1
NLP0.2-0
network1.13.0.1
multtest2.36.0
minpack.lm1.2-1
impute1.54.0
IlluminaHumanMethylation27k.db1.4.8
GIGrvg0.5
genoPlotR0.8.7
effects4.0-3
doParallel1.0.14
distr2.7.0
devtools2.0.1
deSolve1.21
DelayedMatrixStats1.2.0
DCGL2.1.2
crayon1.3.4
ChIPQC1.16.1
bootstrap2017.2
biomaRt2.36.1
affyio1.50.0
stabs0.6-3
siggenes1.54.0
sets1.0-18
rredlist0.5.0
ReportingTools2.20.0
qcc2.7
ps1.2.1
preprocessCore1.42.0
pedigreemm0.3-3
pbdDEMO0.3-1
nonnest20.5-2
mixtools1.1.0
Matrix1.2-15
lumi2.32.0
logging0.7-103
lavaan0.6-3
igraph1.2.2
GSEABase1.42.0
GSA1.03
ggplot23.1.0
GenomicDistributions0.5
fda2.4.8
classInt0.2-3
ChIPseqR1.34.0
cellranger1.1.0
apcluster1.4.7
AIM1.01
00LOCK-survivalunknown
xtable1.8-3
VariantAnnotation1.26.1
TFBSTools1.18.0
taRifx1.0.6.1
survMisc0.5.5
SuperLearner2.0-24
snakecase0.9.2
sm2.2-5.6
session1.0.3
rstanarm2.18.1
RSNNS0.4-11
Rsamtools1.32.3
plink2R1.1
phyclust0.1-22
MBESS4.4.3
manipulate1.0.1
IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0
goftest1.1-1
GenomicRanges1.32.7
DPpackage1.1-7.4
dismo1.1-4
CNTools1.36.0
BSgenome.Hsapiens.1000genomes.hs37d50.99.1
BRAIN1.26.0
BBmisc1.11
BB2014.10-1
AllelicImbalance1.18.0
xtermStyle3.0.5
widgetTools1.58.0
ritis0.7.2
RcppGSL0.3.6
pvclust2.0-0
processx3.2.0
parallelMap1.3
optimsimplex1.0-7
OceanView1.0.4
nleqslv3.3.2
mouse4302cdf2.18.0
merTools0.4.1
Matching4.9-3
maftools1.6.15
linprog0.9-2
jomo2.6-4
htmlTable1.12
HSMMSingleCell0.114.0
HSAUR21.1-17
gtable0.2.0
GenomeInfoDbData1.1.0
gee4.13-19
gamlss.dist5.1-0
fpc2.1-11.1
fnord2unknown
falconx0.2
EGSEAdata1.8.0
dbscan1.1-2
caret6.0-80
BSgenome.Dmelanogaster.UCSC.dm21.4.0
BiasedUrn1.07
BANOVA1.1.1
Vennerable3.1.0.9000
truncdist1.0-2
tmvnsim1.0-2
snowfall1.84-6.1
seqinr3.4-5
sciClone1.1.0
rematch1.0.1
Rbowtie21.2.0
randomForest4.6-14
PolynomF1.0-2
org.Rn.eg.db3.6.0
MLmetrics1.1.1
lmm1.2
labeling0.3
jackstraw1.2
GenABEL.data1.0.0
fastGHQuad1.0
DEoptim2.2-4
cnveR0.99.0
CircStats0.2-6
BubbleTree2.10.0
BitSeq1.24.0
bdsmatrix1.3-3
BAC1.40.0
afex0.22-1
xts0.11-2
taxize0.9.4
sn1.5-2
rstantools1.5.1
ranger0.10.1
OrganismDbi1.22.0
OpenMx2.11.5
OpenImageR1.1.1
nucleR2.12.1
NanoStringNorm1.2.1
mouse4302frmavecs1.5.0
minfi1.26.2
lpsymphony1.8.0
loomR0.2.0
lfa1.10.0
Iso0.0-17
IPPD1.28.0
hgu133a2cdf2.18.0
goseq1.32.0
futile.options1.0.1
forecast8.4
diagram1.6.4
dexus1.20.0
AssotesteR0.1-10
nlme3.1-137
wikitaxa0.3.0
statnet.common4.1.4
sjstats0.17.1
shinyBS0.61
segmented0.5-3.0
scater1.8.4
rvcheck0.1.1
reshape21.4.3
rentrez1.2.1
PKI0.1-5.1
pcaPP1.9-73
pbdSLAP0.2-4
paran1.5.2
NuPoP1.30.0
mda0.4-10
matrixStats0.54.0
magrittr1.5
KEGGdzPathwaysGEO1.18.0
hgu133plus2cdf2.18.0
gmp0.5-13.2
girafe1.32.0
ggvis0.4.4
ggsignif0.4.0
ggjoy0.4.1
gbm2.1.4
FME1.3.5
dummies1.5.6
DRR0.0.3
Category2.46.0
BSgenome.Celegans.UCSC.ce21.4.0
beeswarm0.2.3
ash1.0-15
annotate1.58.0
affycoretools1.52.2
00LOCK-rlangunknown
tools3.5.0
sem3.1-9
SAIGE0.29.4.2
RTCGA1.10.0
rmeta3.0
RLRsim3.1-3
repr0.17
pseval1.3.0
mitml0.3-6
JGR1.8-6
gamlss5.1-2
cubature1.4-1
biocViews1.48.3
arrayQualityMetrics3.36.0
argparse1.1.1
zipcode1.0
VennDiagram1.6.20
TTR0.23-4
sjPlot2.6.1
scDD1.4.0
RWeka0.4-38
runjags2.0.4-2
R.oo1.22.0
Rmpi0.6-7
R.basic0.53.0
PopSV1.1
pbmcapply1.3.0
kinship21.6.4
IlluminaHumanMethylationEPICmanifest0.3.0
gpls1.52.0
forestplot1.7.2
doRNG1.7.1
dglm1.8.3
CompQuadForm1.4.3
chron2.3-53
biomformat1.8.0
arrayQuality1.58.0
00LOCK-digestunknown
xlsxjars0.6.1
wasabi0.2
vrmlgen1.4.9
vegan2.5-3
tximportData1.8.0
scde2.8.0
Rvmmin2018-4.17
rjags4-6
PearsonDS1.1
mvnfast0.2.5
MCMCpack1.4-4
itertools0.1-3
IlluminaHumanMethylationEPICanno.ilm10b2.hg190.6.0
ICS1.3-1
hmmm1.0-4
haplo.stats1.7.9
desc1.2.0
corrplot0.84
circlize0.4.4
Brobdingnag1.2-6
ber4.0
visNetwork2.0.4
tilingArray1.58.0
spatstat1.57-1
RUVSeq1.14.0
RItools0.1-16
pracma2.1.8
party1.3-1
pacman0.5.0
mogene20sttranscriptcluster.db8.7.0
MixABEL0.1-2
MESS0.5.2
LogicReg1.5.10
grImport0.9-1
gageData2.18.0
FourCSeq1.14.0
elrm1.2.2
earth4.6.3
commonsMath1.2
Canopy1.3.0
blme1.0-4
urca1.3-0
txtplot1.0-3
TSP1.1-6
SparseM1.77
SC31.8.0
rnoaa0.7.0
RBGL1.56.0
R2OpenBUGS3.2-3.2
qlcMatrix0.9.7
purrr0.2.5
NBPSeq0.3.0
msir1.3.1
lpSolve5.6.13
lambda.r1.2.3
labelled1.1.0
influenceR0.1.0
iClusterPlus1.16.0
ggsci2.9
GenomeInfoDb1.16.0
fnordunknown
doMPI0.2.2
docopt0.6.1
dlm1.1-5
DelayedArray0.6.6
DatABEL0.9-6
clusterProfiler3.8.1
trust0.1-7
SRAdb1.42.2
schoolmath0.4
rlang0.3.0.1
Rhdf5lib1.2.1
RcppArmadillo0.9.100.5.0
RArcInfo0.4-12
R2admb0.7.16
qqman0.1.4
PopGenome2.6.1
polspline1.1.13
optimbase1.0-9
mlogit0.3-0
LearnBayes2.15.1
lawstat3.2
later0.7.5
kknn1.3.1
heatmap.plus1.3
GEOmetadb1.42.0
gclus1.3.1
gaussquad1.0-2
ensembldb2.4.1
DivE1.0
curl3.2
covr3.2.1
copula0.999-18
canine2.db3.2.3
BH1.66.0-1
aplpack1.3.2
acepack1.4.1
solrium1.0.0
rgl0.99.16
RANN2.6
PROcess1.56.0
mcbiopi1.1.6
logitnorm0.8.37
lintr1.0.2
GGIR1.6-7
geoR1.7-5.2.1
geneplotter1.58.0
fishplot0.4
BradleyTerry21.0-8
BeadDataPackR1.32.0
Amelia1.7.5
grid3.5.0
synchronicity1.3.5
simpleaffy2.56.0
shinyjs1.0
sessioninfo1.1.1
RcppParallel4.4.1
proxy0.4-22
pd.mouse430.23.12.0
mime0.6
metafor2.0-0
MALDIquant1.18
KEGGREST1.20.2
Hmisc4.1-1
hapsim0.31
gsalib2.1
cvAUC1.1.0
BiocStyle2.8.2
aod1.3
AER1.2-5
zeallot0.1.0
tm0.7-5
SomaticSignatures2.16.0
scales1.0.0
rsconnect0.8.8
R.devices2.16.0
patchwork0.0.1
org.Cf.eg.db3.6.0
nortest1.0-4
JunctionSeq1.10.0
isdparser0.3.0
inflection1.3
hgu133plus2.db3.2.3
gplots3.0.1
formatR1.5
evaluate0.12
dplyr0.7.7
cometExactTest0.1.5
combinat0.0-8
cmm0.12
amap0.8-16
methods3.5.0
WGCNA1.66
useful1.2.6
trimcluster0.1-2.1
survey3.34
superpc1.09
startupmsg0.9.5
SQUAREM2017.10-1
shinycssloaders0.2.0
seqbias1.28.0
rms5.1-2
reticulate1.10
plotrix3.7-4
phia0.2-1
ParamHelpers1.11
NeatMap0.3.6.2
MVA1.0-6
Homo.sapiens1.3.1
HKprocess0.0-2
hapmapsnp61.22.0
gridExtra2.3
geeM0.10.1
FField0.1.0
dndscv0.0.0.9
crosstalk1.0.0
affy1.58.0
affxparser1.52.0
weights1.0
tseries0.10-45
tensorflow1.9
tensor1.5
splus2R1.2-2
SIS0.8-6
Repitools1.26.0
R2HTML2.3.2
prodlim2018.04.18
pgf0.0.0.9000
neldermead1.0-11
MutationalPatterns1.6.1
matlab1.0.2
JM1.4-8
IRanges2.14.12
hgu95av2probe2.18.0
geneLenDataBase1.16.0
ffbase0.12.7
fastICA1.2-1
entropy1.2.1
dataview2.1.1
compute.es0.2-4
coda0.19-2
bigmemory4.5.33
Affymoe4302Expr1.18.0
tfruns1.4
sourcetools0.1.7
rstudioapi0.8
RiboProfiling1.10.0
rgeos0.3-28
pkgconfig2.0.2
PFAM.db3.6.0
pedgene2.9
nor1mix1.2-3
motifRG1.24.0
JASPAR20161.8.0
iCNV1.0.0
htmltools0.3.6
globals0.12.4
ggExtra0.8
ff2.2-14
EBImage4.22.1
ddalpha1.3.4
ctc1.54.0
clusterGeneration1.3.4
AnnotationHub2.12.1
abind1.4-5
rpart4.1-13
svUnit0.7-12
snp.plotter0.5.1
R.filesets2.12.1
pca3d0.10
Nozzle.R11.1-1
mvtnorm1.0-8
mixOmics6.3.2
misc3d0.8-4
metap1.0
hwriter1.3.2
googleVis0.6.2
gmm1.6-2
Glimma1.8.2
future1.10.0
DiagrammeR1.0.0
ddCt1.36.0
convert1.56.0
bc3net1.0.4
lattice0.20-38
XLConnect0.2-15
WikipediR1.5.0
tkrplot0.0-24
seqminer6.1
seqMeta1.6.7
Rtsne0.13
RSpectra0.13-1
RGalaxy1.24.0
RColorBrewer1.1-2
pso1.0.3
minqa1.2.4
maxLik1.3-4
logisticPCA0.2
htmlwidgets1.3
hgu133a2.db3.2.3
HDF5Array1.8.1
ggridges0.5.1
gcrma2.52.0
FD1.0-12
eqtl1.1-7
epiDisplay3.5.0.1
bumphunter1.22.0
BSgenome1.48.0
compiler3.5.0
TxDb.Hsapiens.UCSC.hg38.knownGene3.4.0
timeDate3043.102
statnet2018.10
stargazer5.2.2
sjmisc2.7.6
shape1.4.4
RVtests1.2
R.utils2.7.0
rda1.0.2-2.1
popgraph1.5.0
httpcode0.2.0
gridBase0.4-7
GOSim1.18.0
getPass0.2-2
genomeIntervals1.36.0
Ecfun0.1-7
Ecdat0.3-1
dotCall641.0-0
bookdown0.7
beanplot1.2
backports1.1.2
affyPLM1.56.0
foreign0.8-71
tree1.0-39
tcltk21.2-11
Rsubread1.30.9
roxygen26.1.1
robCompositions2.0.8
rgexf0.15.3
rgenoud5.8-2.0
polyclip1.9-1
NMOF1.4-3
ltsa1.4.6
limma3.36.5
InteractionSet1.8.0
Icens1.52.0
hu6800cdf2.18.0
hierfstat0.04-22
hgu133a.db3.2.3
hgu133a2probe2.18.0
gower0.1.2
GenomeGraphs1.40.0
GENEAread2.0.5
flexclust1.4-0
estimability1.3
digest0.6.18
BiocParallel1.14.2
affyQCReport1.58.0
WES.1KG.WUGSC1.12.0
usethis1.4.0
TxDb.Mmusculus.UCSC.mm10.knownGene3.4.0
sp1.3-1
SingleCellExperiment1.2.0
shiny1.2.0
Rgraphviz2.24.0
RefFreeEWAS2.1
R2jags0.5-7
quadprog1.5-5
phylobase0.8.4
pbdBASE0.4-5.1
ncvreg3.11-0
microbenchmark1.4-6
lsei1.2-0
GEOquery2.48.0
gdsfmt1.16.0
fitdistrplus1.0-11
clValid0.6-6
cgdsr1.2.10
ascii2.1
argparser0.4
adegenet2.1.1
00LOCK-psunknown
webshot0.5.1
threg1.0.3
survivalROC1.0.3
scatterplot3d0.3-41
RgoogleMaps1.4.2
rBiopaxParser2.20.0
promises1.0.1
plsVarSel0.9.4
pkgmaker0.27
pkgDepTools1.46.0
pd.mogene.2.0.st3.14.1
pbapply1.3-4
ModelMetrics1.2.2
llogistic1.0.0
knitr1.20
jpeg0.1-8
humanomni5quadv1bCrlmm1.0.0
gridSVG1.6-0
FNN1.1.2.1
exactRankTests0.8-29
ellipse0.4.1
doSNOW1.0.16
clusterCrit1.2.8
changepoint2.2.2
carData3.0-2
biovizBase1.28.2
xml21.2.0
tximport1.8.0
topGO2.32.0
timereg1.9.2
texreg1.36.23
stringr1.3.1
squash1.0.8
Runuran0.24
phangorn2.4.0
pbdPROF0.3-1
OmicCircos1.18.0
NbClust3.0
ICC2.3.0
fBasics3042.89
BSgenome.Ecoli.NCBI.200808051.3.1000
xgboost0.71.2
viridis0.5.1
VIM4.7.0
tuneR1.3.3
tibble1.4.2
SweaveListingUtils0.7.7
stepPlr0.93
RNeXML2.2.0
pillar1.3.0
penalized0.9-51
parmigene1.0.2
packrat0.4.9-3
munsell0.5.0
MKmisc1.1
hms0.4.2
HiClimR1.2.3
hgu95av2cdf2.18.0
gProfileR0.6.7
gnm1.1-0
gap1.1-22
fftwtools0.9-8
facets0.5.14
etm1.0.4
energy1.7-5
dgof1.2
cvTools0.3.2
coxme2.2-10
Cairo1.5-9
BSgenome.Cfamiliaris.UCSC.canFam21.4.0
bindrcpp0.2.2
ASCAT2.5.1
aroma.light3.10.0
ActiveDriver1.0.0
XVector0.20.0
units0.6-0
SeqGSEA1.20.0
samr3.0
R.huge0.9.0
registry0.5
recipes0.1.3
Rcpp0.12.19.3
png0.1-7
missMethyl1.14.0
MatrixModels0.4-1
intansv1.22.0
ICSNP1.1-1
gsmoothr0.1.7
fastmatch1.1-0
base642.0
ARTP20.9.44
aroma.apd0.6.0
aroma.affymetrix3.1.1
aCGH1.58.0
worrms0.3.0
venneuler1.1-0
VanillaICE1.42.4
ucminf1.1-4
triebeard0.3.0
svglite1.2.1
subselect0.14
SNPRelate1.14.0
SingleR0.2.0
ShortRead1.38.0
Rsolnp1.16
reshape0.8.8
profileModel0.5-9
phytools0.6-60
mzID1.18.0
mixdist0.5-5
mi1.0
mboost2.9-1
km.ci0.5-2
isobar1.26.0
inlinedocs2013.9.3
highr0.7
genomewidesnp6Crlmm1.0.7
filehash2.4-1
ergm.count3.3.0
EpiDynamics0.3.0
enrichplot1.0.2
Deducer0.7-9
CVST0.2-2
cleaver1.18.0
bnclassify0.4.1
arm1.10-1
nnet7.3-12
mgcv1.8-25
stringdist0.9.5.1
setRNG2013.9-1
sendmailR1.2-1
riboSeqR1.14.0
rappdirs0.3.1
plot3Drgl1.0.1
pepr0.0.3
multcomp1.4-8
mlbench2.1-1
ggforce0.1.3
FlowSorted.Blood.450k1.18.0
findpython1.0.3
fields9.6
dtw1.20-1
broom0.5.0
aws2.2-0
annotatr1.6.0
Manage your own packages

Packages installed in the user's home directory

You can install your own packages in your home directory. On our systems, the default path to the library is ~/R/<ver>/library where ver is the two digit version of the R in use (e.g. 3.5). Here is an example using the pacman package for easier package management:

[user@cn3144 ~]$ R

R version 3.5.0 (2018-04-23) -- "Joy in Playing"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)


> library(pacman)
> p_isinstalled(rapport)
[1] FALSE
> p_install(rapport)
Installing package into ‘/usr/local/apps/R/3.5/site-library’
(as ‘lib’ is unspecified)
Warning in utils::install.packages(package, ...) :
  'lib = "/usr/local/apps/R/3.5/site-library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/3.5/library’
to install packages into? (yes/No/cancel) yes
also installing the dependency ‘rapportools’

[...snip...]
rapport installed
>

Per project package management

An alternative approach to relying on packages installed centrally or in your home directory is to create isolated, per project, package sets. Snapshots of these package sets can be shared. This increases reproducibility at the cost of increased storage and potential package installation headaches. The packrat package implements this approach. See the packrat walkthrough for a basic introduction.

R batch job

R batch jobs are similar to any other batch job. A batch script ('rjob.sh') is created that sets up the environment and runs the R code:

#!/bin/bash

module load R/3.5
R --vanilla < /data/user/Rtests/Rtest.r > /data/user/Rtests/Rtest.out

or use Rscript instead

#!/bin/bash

module load R/3.5
Rscript /data/user/Rtests/Rtest.r > /data/user/Rtests/Rtest.out

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] rjob.sh

Command line arguments for R scripts

R scripts can be written to accept command line arguments. The simplest way of doing this is with the commandArgs() function. For example the script 'simple_args.R'

args <- commandArgs(trailingOnly=TRUE)

i <- 0
for (arg in args) {
    i <- i + 1
    cat(sprintf("arg %02i: '%s'\n", i, arg))
}

can be called like this

[user@cn3144]$ module load R
[user@cn3144]$ Rscript simple.R this is a test
arg 01: 'this'
arg 02: 'is'
arg 03: 'a'
arg 04: 'test'
[user@cn3144]$ Rscript simple.R 'this is a test'
arg 01: 'this is a test'
[user@cn3144]$ R --vanilla --slave --args 'this is a test' < simple.R
arg 01: 'this is a test'

Alternatively, commandline arguments can be parsed using the getopt package. For example:

library(getopt)

###
### Describe the expected command line arguments
###
# mask: 0=no argument
#       1=required argument
#       2=optional argument
spec <- matrix(c(
# long name  short name  mask  type          description(optional)
# ---------  ----------  ----  ------------  ---------------------
  'file'   , 'f',          1,  'character',  'input file',
  'verbose', 'v',          0,  'logical',    'verbose output', 
  'help'   , 'h',          0,  'logical',    'show this help message'
), byrow=TRUE, ncol=5);

# parse the command line
opt <- getopt(spec);

# show help if requested
if (!is.null(opt$help)) {
  cat(getopt(spec, usage=TRUE));
  q();
}

# set defaults
if ( is.null(opt$file) )    { opt$file    = 'testfile' }
if ( is.null(opt$verbose) ) { opt$verbose = FALSE }
print(opt)

This script an be used as follows

[user@cn3144]$ Rscript getopt_example.R --file some.txt --verbose
$ARGS
character(0)

$file
[1] "some.txt"

$verbose
[1] TRUE

[user@cn3144]$ Rscript getopt_example.R --file some.txt
$ARGS
character(0)

$file
[1] "some.txt"

$verbose
[1] FALSE

[user@cn3144]$ Rscript getopt_example.R --help
Usage: getopt_example.R [-[-file|f] ] [-[-verbose|v]] [-[-help|h]]
    -f|--file       input file
    -v|--verbose    verbose output
    -h|--help       show this help message

getopt does not have support for mixing flags and positional arguments. There are other packages with different features and approaches that can be used to design command line interfaces for R scripts.

Swarm of R jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. rjobs.swarm). For example:

Rscript /data/user/R/R1  > /data/user/R/R1.out
Rscript /data/user/R/R2  > /data/user/R/R2.out
Rscript /data/user/R/R3  > /data/user/R/R3.out

Submit this job using the swarm command.

swarm -f rjobs.swarm [-g #] [-t #] --module R/3.5
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module R/3.5 Loads the R, version 3.5, module for each subjob in the swarm
Rswarm

Rswarm is a utility to create a series of R input files from a single R (master) template file with different output filenames and with unique random number generator seeds. It will simultaneously create a swarm command file that can be used to submit the swarm of R jobs. Rswarm was originally developed by Lori Dodd and Trevor Reeve with modifications by the Biowulf staff.

Say, for example, that the goal of a simulation study is to evaluate properties of the t-test. The function "sim.fun" in file "sim.R" below repeatedly generates random normal data with a given mean, performs a one sample t-test (i.e. testing if the mean is different from 0), and records the p-values.

#######################################
# n.samp:  size of samples generated for each simulation
# mu:      mean
# sd:      standard deviation
# nsim:    the number of simulations
# output1: output table
# seed:    the seed for set.seed
#######################################
sim.fun <- function(n.samp=100, mu=0, sd=1, n.sim, output1, seed){

    set.seed(seed)

    p.values <- c()
    for (i in 1:n.sim){
        x <- rnorm(n.samp, mean=mu, sd=sd)
        p.values <- c(p.values, t.test(x)$p.value)
    }
    saveRDS(p.values, file=output1)
}

To use Rswarm, create a wrapper script similar to the following ("rfile.R")

source("sim.R")
sim.fun(n.sim=DUMX, output1="DUMY1",seed=DUMZ)

using the the dummy variables which will be replaced by Rswarm.

Dummy variableReplaced with
DUMXNumber of simulations to be specified in each replicate file
DUMY1Output file 1
DUMY2Output file 2 (optional)
DUMZRandom seed

To swarm this code, we need replicates of the rfile.R file, each with a different seed and different output file. The Rswarm utility will create the specified number of replicates, supply each with a different seed (from an external file containing seed numbers), and create unique output files for each replicate. Note, that we allow for you to specify the number of simulations within each file, in addition to specifying the number of replicates.

For example, the following Rswarm command at the Biowulf prompt will create 2 replicate files, each specifying 50 simulations, a different seed from a file entitled, "seedfile.txt," and unique output files.

[user@biowulf]$ ls -lh
total 8.0K
-rw-r--r-- 1 user group  63 Apr 25 12:34 rfile.R
-rw-r--r-- 1 user group 564 Apr 25 12:15 seedfile.txt
-rw-r--r-- 1 user group 547 Apr 25 12:04 sim.R
[user@biowulf]$ head -n2 seedfile.txt
24963
27507
[user@biowulf]$ Rswarm --rfile=rfile.R --sfile=seedfile.txt --path=. \
    --reps=2 --sims=50 --start=0 --ext1=.rds
The template file is rfile.R
The seed file is seedfile.txt
The path is .
The number of replicates desired is 2
The number of sims per file is 50
The starting file number is 0+1
The extension for output files 1 is .rds
The extension for output files 2 is .std.txt
Is this correct (y or n)? : y
Creating file number 1: ./rfile1.R with output ./rfile1.rds ./rfile1.std.txt and seed 24963
Creating file number 2: ./rfile2.R with output ./rfile2.rds ./rfile2.std.txt and seed 27507
[user@biowulf]$ ls -lh
total 16K
-rw-r--r-- 1 user group  69 Apr 25 12:39 rfile1.R
-rw-r--r-- 1 user group  69 Apr 25 12:39 rfile2.R
-rw-r--r-- 1 user group  63 Apr 25 12:34 rfile.R
-rw-r--r-- 1 user group  50 Apr 25 12:39 rfile.sw
-rw-r--r-- 1 user group 564 Apr 25 12:15 seedfile.txt
-rw-r--r-- 1 user group 547 Apr 25 12:04 sim.R
[user@biowulf]$ cat rfile1.R
source("sim.R")
sim.fun(n.sim=50, output1="./rfile1.rds",seed=24963)
[user@biowulf]$ cat rfile2.R
source("sim.R")
sim.fun(n.sim=50, output1="./rfile2.rds",seed=27507)
[user@biowulf]$ cat rfile.sw
R --vanilla < ./rfile1.R
R --vanilla < ./rfile2.R
[user@biowulf]$ swarm -f rfile.sw --time=10 --partition=quick --module R
199110
[user@biowulf]$ ls -lh *.rds
-rw-r--r-- 1 user group 445 Apr 25 12:52 rfile1.rds
-rw-r--r-- 1 user group 445 Apr 25 12:52 rfile2.rds

Full Rswarm usage:

Usage: Rswarm [options]
   --rfile=[file]   (required) R program requiring replication
   --sfile=[file]   (required) file with generated seeds, one per line
   --path=[path]    (required) directory for output of all files
   --reps=[i]       (required) number of replicates desired
   --sims=[i]       (required) number of sims per file
   --start=[i]      (required) starting file number
   --ext1=[string]    (optional) file extension for output file 1
   --ext2=[string]    (optional) file extension for output file 2`
   --help, -h         print this help text

Note that R scripts can be written to take a random seed as a command line argument or derive it from the environment variable SLURM_ARRAY_TASK_ID to achieve an equivalent result.

Using the parallel package

The R parallel package provides functions for parallel execution of R code on machines with multiple CPUs. Unlike other parallel processing methods, all jobs share the full state of R when spawned, so no data or code needs to be initialized if it was loaded before starting worker processes. The actual spawning is very fast as well since no new R instance needs to be started.

Parallel includes the dectectCores function which is often used to automatically detect the number of available CPUs. However, it always reports all CPUs available on a node irrespective of how many CPUs were allocated to the job. Therefore batch jobs should use the following function to automatically detect the number of allocated CPUs:

detectBatchCPUs <- function() { 
    ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK")) 
    if (is.na(ncores)) { 
        ncores <- as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE")) 
    } 
    if (is.na(ncores)) { 
        return(4) # for helix
    } 
    return(ncores) 
}
Or, alternatively, use the alvailableCores() function from the future package:
future::availableCores()

Note that future::availableCores() only works properly if SLURM_CPUS_PER_TASK is set (i.e. when using swarm or passing the --cpus-per-task/-c options to sbatch or sinteractive).

The number of detected cores can then be used as usual

ncpus <- detectBatchCPUs() 
# or ncpus <- future::availableCores()
options(mc.cores = ncpus) 
mclapply(..., mc.cores = ncpus) 
makeCluster(ncpus)

Batch R jobs using detectBatchCPUs can then be submitted with sbatch --cpus-per-core=X or swarm -t X and will behave properly.

Alternatively, if your code can make use of all CPUs on a node efficiently, you could allocate nodes exclusively ( sbatch --exclusive or swarm -t auto) and use the built-in detectCores function.

The state of the random number generator in each worker process has to be carefully considered for any parallel workloads. See the help for mcparallel and the paralle package documentation for more details.

Using the BiocParallel package

The R BiocParallel provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects. Like the parallel package, it is not aware of slurm allocations and will therefore, by default, try to use parallel::detectCores() - 2 CPUs, which is all but 2 CPUs installed on a compute node irrespective of how many CPUs have been allocated to a job. That will lead to overloaded jobs and very inefficient code. You can verify this by checking on the registered backends after allocating an interactive session with 2 CPUs:

> library(BiocParallel)
> registered()
$MulticoreParam
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 54; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bptimeout: 2592000; bpprogressbar: FALSE
  bpRNGseed: 
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK
[...snip...]

So the default backend (top of the registered stack) would use 54 workers on 2 CPUs. The default backend can be changed with

> options(MulticoreParam=quote(MulticoreParam(workers=future::availableCores())))
> registered()
$MulticoreParam
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 2; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
[...snip..]

or

> register(MulticoreParam(workers = future::availableCores(), default=TRUE))

Alternatively, a param object can be passed to BiocParallel functions.

Implicit multithreading

R can do implicit multithreading when using a subset of optimized functions in the library or functions that take advantage of parallelized routines in the lower level math libraries.

The function crossprod(m) which is equivalent to calculating t(m) %*% m, for example, makes use of implicit parallelism in the underlying math libraries and can benefit from using more than one thread. The number of threads used by such functions is regulated by the environment variable OMP_NUM_THREADS, which the R module sets automatically when loaded as part of a batch or interactive job. Here is the runtime of this function with different values for OMP_NUM_THREADS:

crossprod benchmark

The code used for this benchmark was

# this file is benchmark2.R
runs <- 3
o <- 2^13
b <- 0

for (i in 1:runs) {
  a <- matrix(rnorm(o*o), o, o)
  invisible(gc())
  timing <- system.time({
    b <- crossprod(a)		# equivalent to: b <- t(a) %*% a
  })[3]
  cat(sprintf("%f\n", timing))
}

And was called with

node$ module load R/3.5
node$ OMP_NUM_THREADS=1 Rscript benchmark2.R
node$ OMP_NUM_THREADS=2 Rscript benchmark2.R
...
node$ OMP_NUM_THREADS=32 Rscript benchmark2.R

From within a job that had been allocated 32 CPUs.

Notes:

There appears to also be another level of parallelism within the R libraries. One function that takes advantage of this is the dist function. The level of parallelism allowed with this mechanism seems to be set with two internal R functions (setMaxNumMathThreads and setNumMathThreads). Note that this is a distinct mechanism - i.e. setting OMP_NUM_THREADS has no impact on dist and setMaxNumMathThreads has no impact on the performance of crossprod. Here is the performance of dist with different numbers of threads:

dist benchmark

The timings for this example were created with

# this file is benchmark1.R
rt <- data.frame()
o <- 2^12
m <- matrix(rnorm(o*o), o, o)
for (nt in c(1, 2, 4, 8, 16, 32)) {
    .Internal(setMaxNumMathThreads(nt)) 
    .Internal(setNumMathThreads(nt))
    res <- system.time(d <- dist(m))
    rt <- rbind(rt, c(nt, o, res[3]))
}
colnames(rt) <- c("threads", "order", "elapsed")
write.csv(rt, file="benchmark1.csv", row.names=F)

This was run within an allocation with 32 CPUs with

node$ OMP_NUM_THREADS=1 Rscript benchmark1.R

The same notes about benchmarking as above apply. Also note that there is very little documentation about this to be found online.

MPI jobs with Rmpi or Snow

Rmpi provides an MPI interface for R [Rmpi documentation].
The package snow (Simple Network of Workstations) implements a simple mechanism for using a workstation cluster for ``embarrassingly parallel'' computations in R. [snow documentation]

Sample Rmpi batch script:

#!/bin/bash


module load R
R --vanilla > myrmpi.out_slurm <<EOF
library(Rmpi)
mpi.spawn.Rslaves(nslaves=$SLURM_NTASKS)
mpi.remote.exec(mpi.get.processor.name())
n <- 3
mpi.remote.exec(double, n)
mpi.close.Rslaves()
mpi.quit()
EOF

Submit the Rmpi example with:

sbatch --ntasks=4 --ntasks-per-core=1 filename.bat

If using more than 16 processors, you will need to request the --multinode partition.

sbatch --ntasks=64 --multinode filename.bat

Sample batch script using snow:

#!/bin/bash

module load R

R --vanilla > myrmpioutsnow_slurm <<EOF

library(snow)
cl <- makeCluster($SLURM_NTASKS, type = "MPI")
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
clusterCall(cl, runif, $SLURM_NTASKS)
stopCluster(cl)

EOF

Submit the snow example with:

sbatch --ntasks=4 --ntasks-per-core=1 filename.bat
Note that it is entirely up to the user to run the appropriate number of processes for the nodes requested. In the example above, the $SLURM_NTASKS variable is set to 4 and exported via the sbatch command, and this variable is used in the script to run 4 snow processes on 2 dual-cpu nodes. Note: myrmpiout_slurm contains the results from the finished job.

Running an interactive job on Biowulf

Production runs should be run with batch as above, but for testing purposes an occasional interactive run may be useful.

Sample interactive session with Rmpi:

[user@biowulf ~]$ sinteractive -J myRmpitest --ntasks 4
salloc.exe: Granted job allocation 23208
[user@cn0004 ~]$ 
[user@pcn004 ~]$ module load R
[+] Loading gcc 4.4.7 ...
[+] Loading OpenMPI 1.8.1 for GCC 4.4.7 (ethernet) ...
[+] Loading tcl_tk 8.6.1
[+] Loading ATLAS 3.8.4 libraries...
[+] Loading R 3.2.0 on biowulf.nih.gov

[user@pcn0004 ~]$ R --vanilla
R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Rmpi)
> mpi.spawn.Rslaves(nslaves=4)
[cn0004:57808] [[42360,1],0] FORKING HNP: orted --hnp --set-sid --report-uri 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca ess_base_jobid 2776104960
        4 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 5 is running on: cn0004 
slave1 (rank 1, comm 1) of size 5 is running on: cn0004 
slave2 (rank 2, comm 1) of size 5 is running on: cn0004 
slave3 (rank 3, comm 1) of size 5 is running on: cn0004 
slave4 (rank 4, comm 1) of size 5 is running on: cn0004 

> demo("simplePI")
...
> simple.pi(10000)
[1] 3.141593
> mpi.close.Rslaves()
[1] 1
> mpi.quit()                      #very important
[user@cn0004 ~]
[user@cn0004 ~]$ exit
exit
salloc.exe: Relinquishing job allocation 23209
salloc.exe: Job allocation 23208 has been revoked.

[user@biowulf ~]

Sample interactive session with snow: (user input in bold)

[user@biowulf ~]$ sinteractive -J mysnowtest --ntasks=4
salloc.exe: Granted job allocation 23210
[user@cn0004 ~]$ 
[user@cn004 ~]$ module load R
[+] Loading gcc 4.4.7 ...
[+] Loading OpenMPI 1.8.1 for GCC 4.4.7 (ethernet) ...
[+] Loading tcl_tk 8.6.1
[+] Loading ATLAS 3.8.4 libraries...
[+] Loading R 3.2.0 on cn0004

[user@cn0004 ~]$ R --vanilla

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(snow)
> cl <- makeCluster(4, type = "MPI")
Loading required package: Rmpi
[cn0004:59204] [[41964,1],0] FORKING HNP: orted --hnp --set-sid --report-uri 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca ess_base_jobid 2750152704
        4 slaves are spawned successfully. 0 failed.

> sum(parApply(cl, matrix(1:100,10), 1, sum))
[1] 5050
> clusterCall(cl, function() Sys.info()[c("nodename","machine")])
[[1]]
nodename  machine 
"cn0004" "x86_64" 

[[2]]
nodename  machine 
"cn0004" "x86_64" 

[[3]]
nodename  machine 
"cn0004" "x86_64" 

[[4]]
nodename  machine 
"cn0004" "x86_64" 

> clusterCall(cl, runif, 3)
[[1]]
[1] 0.4372988 0.9606632 0.9975864

[[2]]
[1] 0.7348379 0.8070261 0.8572994

[[3]]
[1] 0.7965404 0.7158484 0.3440546

[[4]]
[1] 0.83963913 0.70896938 0.06466967

> stopCluster(cl)
[1] 1
> mpi.quit()
[user@cn0004 ~]$ 
[user@cn0004 ~]$ exit
exit
salloc.exe: Relinquishing job allocation 13176
salloc.exe: Job allocation 13177 has been revoked.
[user@biowulf ~]

h2o

h2o is a machine learning package written in java. The R interface starts a java h2o instance with a given number of threads and then connects to it through http. This fails on compute nodes if the http proxy variables are set. Therefore it is necessary to unset http_proxy before using h2o:

[user@biowulf]$ sinteractive --cpus-per-task=4 --mem=20g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load R/3.5
[user@cn3144 ~]$ unset http_proxy
[user@cn3144 ~]$ R
R version 3.5.0 (2018-04-23) -- "Joy in Playing"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
[...snip...]
> library(h2o)
> h2o.init(ip='localhost', nthreads=future::availableCores(), max_mem_size='12g')
H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpVdW92Y/h2o_wresch_started_from_r.out
    /tmp/RtmpVdW92Y/h2o_wresch_started_from_r.err

openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

Starting H2O JVM and connecting: . Connection successful!

R is connected to the H2O cluster:
    H2O cluster uptime:         2 seconds 145 milliseconds
    H2O cluster timezone:       America/New_York
    H2O data parsing timezone:  UTC
    H2O cluster version:        3.20.0.2
    H2O cluster version age:    2 months and 12 days
    H2O cluster name:           H2O_started_from_R_wresch_ser222
    H2O cluster total nodes:    1
    H2O cluster total memory:   10.67 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4
    R Version:                  R version 3.5.0 (2018-04-23)

>