GRNBoost: Scalable inference of gene regulatory networks using Apache Spark and XGBoost.
GRNBoost is a library built on top of Apache Spark that implements a scalable strategy for gene regulatory network (GRN) inference. GRNBoost was inspired by GENIE3, a popular algorithm for GRN inference. GRNBoost adopts GENIE3's algorithmic blueprint and aims at improving its runtime performance and data size capability.
References:
- S.Aibar et al.
SCENIC: single-cell regulatory network inference and clustering
Nature Methods, 2017, Vol. 14, p.1083–1086; doi:10.1038/nmeth.4463
- Van Anh Huynh-Thu et al.
Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
PLoS ONE, 5(9): e12776. doi:10.1371/journal.pone.0012776
Documentation
Important Notes
- Module Name: GRNBoost (see the modules page for more information)
- Unusual environment variables set
- GRNBOOST_HOME installation directory
- GRNBOOST_BIN executable directory
- GRNBOOST_SRC source code directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=8g [user@cn3199 ~]$ module load grnboost [+] Loading java 1.8.0_11 ... [+] Loading scala 2.12.2 [+] Loading Apache Spark 2.1.1 (Hadoop 2.7) ... [+] Loading grnboost 20191216Start the spark cluster:
[user@cn3199 ~]$ spark start -t 120 2 INFO: Submitted job for cluster fwPsexRun a sample GRNBoost command:
[user@cn3199 ~]$ $SPARK_HOME/bin/spark-submit \ --class org.aertslab.grnboost.GRNBoost \ --master spark://cn3199:7077 \ --deploy-mode client \ --jars $GRNBOOST_SRC/lib_amazon_linux/xgboost4j-0.7.jar \ $GRNBOOST_SRC/target/scala-2.11/GRNBoost.jar -h GRNBoost 0.1 Usage: GRNBoost [infer] [options] -h | --help Prints this usage text. -v | --version Prints the version number. Command: infer [options] -i <i;file> | --input <i;file> REQUIRED. Input file or directory. -o <file> | --output <file> REQUIRED. Output directory. -tf <file> | --regulators <file> REQUIRED. Text file containing the regulators (transcription factors), one regulator per line. -skip <nr> | --skip-headers <nr> The number of input file header lines to skip. Default: 0. --delimiter <del> The delimiter to use in input and output files. Default: TAB. -s <nr> | --sample <nr> Use a sample of size <nr> of the observations to infer the GRN. --targets <gene1,gene2,gene3...> List of genes for which to infer the putative regulators. -p:<key>=<value> | --xgb-param:<key>=<value> Add or overwrite an XGBoost booster parameter. Default parameters are: * eta -> 0.01 * max_depth -> 1 * nthread -> 1 * silent -> 1 -r <nr> | --nr-boosting-rounds <nr> Set the number of boosting rounds. Default: heuristically determined nr of boosting rounds. --estimation-genes <gene1,gene2,gene3...> List of genes to use for estimating the nr of boosting rounds. --nr-estimation-genes <nr> Nr of randomly selected genes to use for estimating the nr of boosting rounds. Default: 20. --regularized Enable regularization (using the triangle method). Default: disabled When enabled, only regulations approved by the triangle method will be emitted. When disabled, all regulations will be emitted. Use the 'include-flags' option to specify whether to output the include flags in the result list. --normalized Enable normalization by dividing the gain scores of the regulations per target over the sum of gain scores. Default = disabled. --include-flags <true/false> Flag whether to output the regularization include flags in the output. Default: false. --truncate <nr> Only keep the specified number regulations with highest importance score. Default: unlimited. (Motivated by the 100.000 regulations limit for the DREAM challenges.) -par <nr> | --nr-partitions <nr> The number of Spark partitions used to infer the GRN. Default: nr of available processors. --dry-run Inference nor auto-config will launch if this flag is set. Use for parameters inspection. --cfg-run Auto-config will launch, inference will not if this flag is set. Use for config testing. --report <true/false> Set whether to write a report about the inference run to file. Default: true. GRNBoost -------- https://github.com/aertslab/GRNBoost/Stop the spark cluster (here, the name of the cluster is the same as was assigned after the 'start spark cluster' command):
[user@cn3199 ~]$ park stop fwPsexEnd the interactive session:
[user@cn3199 ~]$ exit [user@biowulf ~]$