Pychopper is used to identify, orient and trim full-length Nanopore cDNA reads. The tool is also able to rescue fused reads.
-m edlib
is associated with stalled runs and should probably be avoided.$PYCHOPPER_TEST_DATA
cdna_classifier.py
was renamed to pychopper
between 2.4.0 and 2.7.1Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=10g --cpus-per-task=6 --gres=lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ module load pychopper [user@cn3144]$ zcat $PYCHOPPER_TEST_DATA/SIRV_E0_pcs109_25k.fq.gz > input.fastq [user@cn3144]$ ## pychopper was called cdna_classifier.py in versions ≤ 2.4.0 [user@cn3144]$ pychopper -r report.pdf -u unclassified.fastq -t $SLURM_CPUS_PER_TASK \ -w rescued.fastq input.fastq - | gzip -c > /data/$USER/temp/full_length.fastq.gz Using kit: PCS109 Configurations to consider: "+:SSP,-VNP|-:VNP,-SSP" Total fastq records in input file: 25000 Tuning the cutoff parameter (q) on 9465 sampled reads (40.0%) passing quality filters (Q ≥ 7.0). Optimizing over 30 cutoff values. 100%|████████████████████████████████████████████████████████| 30/30 Best cutoff (q) value is 0.3448 with 88% of the reads classified. Processing the whole dataset using a batch size of 4166: 94%|██████████████████████████████████████████████████ | 23614/25000 Finished processing file: input.fastq Input reads failing mean quality filter (Q < 7.0): 1386 (5.54%) Output fragments failing length filter (length < 50): 0 ----------------------------------- Reads with two primers: 86.93% Rescued reads: 3.16% Unusable reads: 9.91% -----------------------------------
Move the rescuted and unclassified reads and the reports if you need them before ending the session.
[user@cn3144]$ gzip -c rescued.fastq > /data/$USER/temp/rescued.fastq.gz [user@cn3144]$ gzip -c unclassified.fastq > /data/$USER/temp/unclassified.fastq.gz [user@cn3144]$ mv report.pdf /data/$USER/temp [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Create a batch input file (e.g. pychopper.sh), which uses the input file 'pychopper.in'. For example:
#!/bin/bash module load pychopper/2.7.1 || exit 1 cd /lscratch/$SLURM_JOB_ID zcat $PYCHOPPER_TEST_DATA/SIRV_E0_pcs109_25k.fq.gz > input.fastq ## use cdna_classifier.py instead of pychopper for versions ≤ 2.4.0 pychopper -r report.pdf -u unclassified.fastq -t $SLURM_CPUS_PER_TASK \ -w rescued.fastq input.fastq - | gzip -c > /data/$USER/temp/full_length.fastq.gz gzip -c rescued.fastq > /data/$USER/temp/rescued.fastq.gz gzip -c unclassified.fastq > /data/$USER/temp/unclassified.fastq.gz mv report.pdf /data/$USER/temp
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=10g pychopper.sh
Create a swarmfile (e.g. pychopper.swarm). For example:
zcat input1.fastq.gz > /lscratch/$SLURM_JOB_ID/input1.fastq && \ pychopper -t $SLURM_CPUS_PER_TASK /lscratch/$SLURM_JOB_ID/input1.fastq - \ | gzip -c > /data/$USER/temp/full_length1.fastq.gz zcat input2.fastq.gz > /lscratch/$SLURM_JOB_ID/input1.fastq && \ pychopper -t $SLURM_CPUS_PER_TASK /lscratch/$SLURM_JOB_ID/input2.fastq - \ | gzip -c > /data/$USER/temp/full_length2.fastq.gz
Submit this job using the swarm command.
swarm -f pychopper.swarm -g 10 -t 6 --module pychopper/2.0.3where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module pychopper | Loads the pychopper module for each subjob in the swarm |