Pychopper is used to identify, orient and trim full-length Nanopore cDNA reads. The tool is also able to rescue fused reads.
-m edlib
is associated with stalled runs and should probably be avoided.$PYCHOPPER_TEST_DATA
cdna_classifier.py
was renamed to pychopper
between 2.4.0 and 2.7.1Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=10g --cpus-per-task=6 --gres=lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ module load pychopper/2.7.10 [user@cn3144]$ cp $PYCHOPPER_TEST_DATA/SIRV_E0_pcs109_25k.fq.gz . [user@cn3144]$ pychopper -r report.pdf -u unclassified.fastq -t $SLURM_CPUS_PER_TASK \ -w rescued.fastq SIRV_E0_pcs109_25k.fq.gz - | gzip -c > /data/$USER/temp/full_length.fastq.gz Using kit: /opt/conda/lib/python3.12/site-packages/pychopper/primer_data/cDNA_SSP_VNP.fas Configurations to consider: "+:SSP,-VNP|-:VNP,-SSP" Total fastq records in input file: 25000 Tuning the cutoff parameter (q) on 9465 sampled reads (40.0%) passing quality filters (Q ≥ 7.0). Optimizing over 30 cutoff values. 100%|████████████████████████████████████████████████████████| 30/30 Best cutoff (q) value is 0.3448 with 88% of the reads classified. Processing the whole dataset using a batch size of 4166: 94%|██████████████████████████████████████████████████ | 23614/25000 Finished processing file: input.fastq Input reads failing mean quality filter (Q < 7.0): 1386 (5.54%) Output fragments failing length filter (length < 50): 0 ----------------------------------- Reads with two primers: 86.93% Rescued reads: 3.16% Unusable reads: 9.91% -----------------------------------
Move the rescuted and unclassified reads and the reports if you need them before ending the session.
[user@cn3144]$ gzip -c rescued.fastq > /data/$USER/temp/rescued.fastq.gz [user@cn3144]$ gzip -c unclassified.fastq > /data/$USER/temp/unclassified.fastq.gz [user@cn3144]$ mv report.pdf /data/$USER/temp [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Create a batch input file (e.g. pychopper.sh), which uses the input file 'pychopper.in'. For example:
#!/bin/bash module load pychopper/2.7.10 || exit 1 cd /lscratch/$SLURM_JOB_ID cp $PYCHOPPER_TEST_DATA/SIRV_E0_pcs109_25k.fq.gz . pychopper -r report.pdf -u unclassified.fastq -t $SLURM_CPUS_PER_TASK \ -w rescued.fastq SIRV_E0_pcs109_25k.fq.gz - | gzip -c > /data/$USER/temp/full_length.fastq.gz gzip -c rescued.fastq > /data/$USER/temp/rescued.fastq.gz gzip -c unclassified.fastq > /data/$USER/temp/unclassified.fastq.gz mv report.pdf /data/$USER/temp
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=10g pychopper.sh
Create a swarmfile (e.g. pychopper.swarm). For example:
pychopper -t $SLURM_CPUS_PER_TASK input1.fastq.gz - \ | gzip -c > /data/$USER/temp/full_length1.fastq.gz pychopper -t $SLURM_CPUS_PER_TASK input2.fastq.gz - \ | gzip -c > /data/$USER/temp/full_length2.fastq.gz
Submit this job using the swarm command.
swarm -f pychopper.swarm -g 10 -t 6 --module pychopper/2.7.10where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module pychopper | Loads the pychopper module for each subjob in the swarm |