The drop is a pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders.
drop
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=10 --mem=10G
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load drop
[user@cn3144 ~]$ mkdir /data/$USER/drop_test/
[user@cn3144 ~]$ cd /data/$USER/drop_test/
[user@cn3144 ~]$ drop demo
[user@cn3144 ~]$ snakemake -n #dryrun
[user@cn3144 ~]$ snakemake --core 10
WARNING: Using the mae defined genome instead of the globally defined one.
This will be deprecated in the future to allow for reference genomes to be defined in the sample annotation table. Please update your config and sample annotation table
WARNING: GENE_ANNOTATION must be a column in the sample annotation table, ANNOTATION is the old column name and will be deprecated in the future
WARNING: Less than 30 IDs in DROP_GROUP outrider
WARNING: Less than 30 IDs in DROP_GROUP import_exp
WARNING: Less than 30 IDs in DROP_GROUP fraser
check for missing R packages
Structuring dependencies...
Dependencies file generated at: /tmp/tmp1a00_tgp
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 AberrantExpression_Overview_R
1 AberrantExpression_pipeline_Counting_Datasets_R
2 AberrantExpression_pipeline_Counting_Summary_R
10 AberrantExpression_pipeline_Counting_countReads_R
2 AberrantExpression_pipeline_Counting_filterCounts_R
2 AberrantExpression_pipeline_Counting_mergeCounts_R
1 AberrantExpression_pipeline_Counting_preprocessGeneAnnotation_R
1 AberrantExpression_pipeline_OUTRIDER_Datasets_R
2 AberrantExpression_pipeline_OUTRIDER_Summary_R
2 AberrantExpression_pipeline_OUTRIDER_results_R
2 AberrantExpression_pipeline_OUTRIDER_runOutrider_R
1 AberrantSplicing_Overview_R
1 AberrantSplicing_pipeline_Counting_00_define_datasets_from_anno_R
1 AberrantSplicing_pipeline_Counting_01_0_countRNA_init_R
10 AberrantSplicing_pipeline_Counting_01_1_countRNA_splitReads_samplewise_R
1 AberrantSplicing_pipeline_Counting_01_2_countRNA_splitReads_merge_R
10 AberrantSplicing_pipeline_Counting_01_3_countRNA_nonSplitReads_samplewise_R
1 AberrantSplicing_pipeline_Counting_01_4_countRNA_nonSplitReads_merge_R
1 AberrantSplicing_pipeline_Counting_01_5_countRNA_collect_R
1 AberrantSplicing_pipeline_Counting_02_psi_value_calculation_FraseR_R
1 AberrantSplicing_pipeline_Counting_03_filter_expression_FraseR_R
1 AberrantSplicing_pipeline_Counting_DatasetsF_R
1 AberrantSplicing_pipeline_Counting_Summary_R
1 AberrantSplicing_pipeline_FRASER_04_fit_hyperparameters_FraseR_R
1 AberrantSplicing_pipeline_FRASER_05_fit_autoencoder_FraseR_R
1 AberrantSplicing_pipeline_FRASER_06_calculation_stats_AE_FraseR_R
1 AberrantSplicing_pipeline_FRASER_07_extract_results_FraseR_R
1 AberrantSplicing_pipeline_FRASER_Datasets_R
1 AberrantSplicing_pipeline_FRASER_Summary_R
1 Index
1 MonoallelicExpression_Overview_R
1 MonoallelicExpression_pipeline_MAE_Datasets_R
1 MonoallelicExpression_pipeline_MAE_Results_R
2 MonoallelicExpression_pipeline_MAE_deseq_mae_R
1 MonoallelicExpression_pipeline_QC_DNA_RNA_matrix_plot_R
1 MonoallelicExpression_pipeline_QC_Datasets_R
1 MonoallelicExpression_pipeline_QC_create_matrix_dna_rna_cor_R
2 MonoallelicExpression_pipeline_QC_deseq_qc_R
1 aberrantExpression
4 aberrantExpression_bamStats
2 aberrantExpression_mergeBamStats
1 aberrantSplicing
1 aberrantSplicing_dependency
1 all
1 dependencyGraph
1 mae
4 mae_allelicCounts
4 mae_createSNVs
1 mae_dependency
94
Select jobs to execute...
[Thu May 13 17:33:29 2021]
rule mae_dependency:
output: /gpfs/gsfs10/users/apptest1/Output/html/mae-pipeline_dep.svg
jobid: 100
...
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
For submitting large jobs through cluster, we will create a cluster.yaml file first. For example, this yaml file (thanks for contribution from Sarah Silverstein) is based on 80 samples about 130M reads per sample (if you have smaller or less samples, please run snakemake at interactive session or scale down accordingly):
__default__:
partition: norm
threads: 2
memory: 2G
time: "2:00:00"
AberrantExpression_pipeline_OUTRIDER_runOutrider_R:
threads: 4
memory: 10G
time: "20:00:00"
AberrantExpression_pipeline_Counting_countReads_R:
memory: 20G
time: "6:00:00"
AberrantExpression_pipeline_OUTRIDER_Summary_R:
threads: 50
memory: 20G
AberrantExpression_Overview_R:
memory: 5G
AberrantSplicing_pipeline_Counting_01_1_countRNA_splitReads_samplewise_R:
memory: 10G
threads: 4
AberrantSplicing_pipeline_Counting_01_2_countRNA_splitReads_merge_R:
memory: 10G
AberrantSplicing_pipeline_Counting_01_4_countRNA_nonSplitReads_merge_R:
memory: 10G
AberrantSplicing_pipeline_Counting_02_psi_value_calculation_FraseR_R:
memory: 35G
threads: 20
AberrantSplicing_pipeline_Counting_03_filter_expression_FraseR_R:
memory: 15G
AberrantSplicing_pipeline_FRASER_04_fit_hyperparameters_FraseR_R:
memory: 25G
threads: 10
AberrantSplicing_pipeline_FRASER_05_fit_autoencoder_FraseR_R:
memory: 50G
threads: 22
time: "4:00:00"
AberrantSplicing_pipeline_FRASER_06_calculation_stats_AE_FraseR_R:
memory: 110G
threads: 24
time: "4:00:00"
AberrantSplicing_pipeline_FRASER_07_extract_results_FraseR_R:
memory: 25G
AberrantSplicing_pipeline_FRASER_Summary_R:
memory: 15G
threads: 4
AberrantSplicing_Overview_R:
memory: 5G
MonoallelicExpression_pipeline_MAE_Results_R:
memory: 5G
MonoallelicExpression_pipeline_QC_create_matrix_dna_rna_cor_R:
memory: 200G
threads: 2
time: "14:00:00"
mae_allelicCounts:
threads: 4
Create a batch input file (e.g. drop.sh). For example:
#!/bin/bash
set -e
module load drop
cd /data/$USER/drop_test/
snakemake -pr --jobs 10 \
--cluster "sbatch --cpus-per-task={cluster.threads} --mem={cluster.memory} --time={cluster.time}" \
--cluster-config cluster.yaml --latency-wait 120 --max-jobs-per-second 1 \
--max-status-checks-per-second 0.01 all
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=2 --mem=2g --time=4-00:00:00 drop.sh