High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
mirdeep on Biowulf & Helix

Description

miRDeep2 uses the distribution of next generation sequencing reads in the genome along with RNA structure prediction to discover and quantitate the expression of known and novel miRNAs. miRDeep2 represents a complete overhaul of the original miRDeep tool.

miRDeep2 is a collection of perl scripts tied together by 3 main scripts:

Of these, mapper.pl and quantifier.pl may run multithreaded bowtie suprocesses. -o determines the thread count for mapper.pl, and -T for quantifier.pl. Because of this mixed nature of processes, it's best to run individual steps separately rather than combining them into a single batch script.

References

Web sites

Running mirdeep on Helix

Since miRDeep2 involves mapping steps, processing on helix should be limited to very small data sets. As an example here, we'll work through the included tutorial limiting the threaded steps to just two threads.


helix$ cd /data/$USER/test_data/mirdeep
helix$ cp -r /usr/local/apps/mirdeep/2.0.0.7/tutorial .
helix$ cd tutorial
helix$ module load mirdeep
# the following step is not necessary if a prebuilt
# genome index is available
helix$ bowtie-build cel_cluster.fa cel_cluster

helix$ mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT  \
  -l 18 -m -p cel_cluster -o 2 -v \
  -s reads_collapsed.fa \
  -t reads_collapsed_vs_genome.arf
.....
helix$ quantifier.pl -p precursors_ref_this_species.fa \
  -m mature_ref_this_species.fa \
  -r reads_collapsed.fa -t cel -T 2 -y 16_19
.....
helix$ miRDeep2.pl reads_collapsed.fa cel_cluster.fa \
  reads_collapsed_vs_genome.arf \
  mature_ref_this_species.fa \
  mature_ref_other_species.fa \
  precursors_ref_this_species.fa \
  -t C.elegans 2> report.log

Results with links to individual miRNAs can be found in results.html. Note that multiple sample files can be processed together by passing mapper.pl a config file listing all input files and their abbreviated names (see miRDeep2 documentation for details).

Running a single mirdeep batch job on Biowulf

The single threaded and multi threaded steps of the miRDeep2 pipeline could be tied together with snakemake or a similar workflow tool capable of sumbitting batch jobs. For the example here, we will simply write a script that uses job dependencies to tie together the three steps of the whole pipeline:


#! /bin/bash
cd /data/$USER/test_data/mirdeep
cp -r /usr/local/apps/mirdeep/2.0.0.7/tutorial .
cd tutorial
module load mirdeep
bowtie-build cel_cluster.fa cel_cluster

# create files for each job in the pipeline
cat > step1.sh <<'EOF'
#! /bin/bash
#SBATCH --job-name=mirdeep_s1
mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT  \
  -l 18 -m -p cel_cluster -v \
  -o ${SLURM_CPUS_PER_TASK} \
  -s reads_collapsed.fa \
  -t reads_collapsed_vs_genome.arf
EOF

cat > step2.sh <<'EOF'
#! /bin/bash
#SBATCH --job-name=mirdeep_s2
quantifier.pl -p precursors_ref_this_species.fa \
  -m mature_ref_this_species.fa \
  -T ${SLURM_CPUS_PER_TASK} \
  -r reads_collapsed.fa -t cel -y 16_19
EOF

cat > step3.sh <<'EOF'
#! /bin/bash
#SBATCH --job-name=mirdeep_s3
miRDeep2.pl reads_collapsed.fa cel_cluster.fa \
  reads_collapsed_vs_genome.arf \
  mature_ref_this_species.fa \
  mature_ref_other_species.fa \
  precursors_ref_this_species.fa \
  -t C.elegans 2> report.log
EOF

# set up the pipeline run
jid1=$(sbatch -c4 step1.sh)
jid2=$(sbatch -c4 --dependency=afterok:${jid1} step2.sh)
jid3=$(sbatch --dependency=afterok:${jid2} step3.sh)
squeue -u $USER

The script above will submit all three steps as separate jobs. Each job will only execute if the previous job finished successfully. Of course, the same can be achieved by manually creating batch scripts for each job and sumbitting them individually as batch jobs.

Running an interactive job on Biowulf

After starting an interactive sesssion on a compute node with sinteractive, mirdeep is used as described above. For example


biowulf$ sinteractive -c4
salloc.exe: Granted job allocation nnnnnn
srun: error: x11: no local DISPLAY defined, skipping
cn0147$ cd /data/$USER/test_data/mirdeep
cn0147$ cp -r /usr/local/apps/mirdeep/2.0.0.7/tutorial .
cn0147$ cd tutorial
cn0147$ module load mirdeep
cn0147$ bowtie-build cel_cluster.fa cel_cluster
# use 4 threads since we have 4 cpus in the interactive allocation
cn0147$ mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT  \
  -l 18 -m -p cel_cluster -o 4 -v \
  -s reads_collapsed.fa \
  -t reads_collapsed_vs_genome.arf
......
cn0147$ exit
biowulf$
Documentation
Full documentation of all tools that are part of miRDeep.