Pychopper is used to identify, orient and trim full-length Nanopore cDNA reads. The tool is also able to rescue fused reads.
-m edlib is associated with stalled runs and should probably be avoided.$PYCHOPPER_TEST_DATAcdna_classifier.py was renamed to pychopper between 2.4.0 and 2.7.1Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=10g --cpus-per-task=6 --gres=lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ module load pychopper/2.7.10
[user@cn3144]$ cp $PYCHOPPER_TEST_DATA/SIRV_E0_pcs109_25k.fq.gz .
[user@cn3144]$ pychopper -r report.pdf -u unclassified.fastq -t $SLURM_CPUS_PER_TASK \
-w rescued.fastq SIRV_E0_pcs109_25k.fq.gz - | gzip -c > /data/$USER/temp/full_length.fastq.gz
Using kit: /opt/conda/lib/python3.12/site-packages/pychopper/primer_data/cDNA_SSP_VNP.fas
Configurations to consider: "+:SSP,-VNP|-:VNP,-SSP"
Total fastq records in input file: 25000
Tuning the cutoff parameter (q) on 9465 sampled reads (40.0%) passing quality filters (Q ≥ 7.0).
Optimizing over 30 cutoff values.
100%|████████████████████████████████████████████████████████| 30/30
Best cutoff (q) value is 0.3448 with 88% of the reads classified.
Processing the whole dataset using a batch size of 4166:
94%|██████████████████████████████████████████████████ | 23614/25000
Finished processing file: input.fastq
Input reads failing mean quality filter (Q < 7.0): 1386 (5.54%)
Output fragments failing length filter (length < 50): 0
-----------------------------------
Reads with two primers: 86.93%
Rescued reads: 3.16%
Unusable reads: 9.91%
-----------------------------------
Move the rescuted and unclassified reads and the reports if you need them before ending the session.
[user@cn3144]$ gzip -c rescued.fastq > /data/$USER/temp/rescued.fastq.gz [user@cn3144]$ gzip -c unclassified.fastq > /data/$USER/temp/unclassified.fastq.gz [user@cn3144]$ mv report.pdf /data/$USER/temp [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Create a batch input file (e.g. pychopper.sh), which uses the input file 'pychopper.in'. For example:
#!/bin/bash
module load pychopper/2.7.10 || exit 1
cd /lscratch/$SLURM_JOB_ID
cp $PYCHOPPER_TEST_DATA/SIRV_E0_pcs109_25k.fq.gz .
pychopper -r report.pdf -u unclassified.fastq -t $SLURM_CPUS_PER_TASK \
-w rescued.fastq SIRV_E0_pcs109_25k.fq.gz - | gzip -c > /data/$USER/temp/full_length.fastq.gz
gzip -c rescued.fastq > /data/$USER/temp/rescued.fastq.gz
gzip -c unclassified.fastq > /data/$USER/temp/unclassified.fastq.gz
mv report.pdf /data/$USER/temp
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=10g pychopper.sh
Create a swarmfile (e.g. pychopper.swarm). For example:
pychopper -t $SLURM_CPUS_PER_TASK input1.fastq.gz - \ | gzip -c > /data/$USER/temp/full_length1.fastq.gz pychopper -t $SLURM_CPUS_PER_TASK input2.fastq.gz - \ | gzip -c > /data/$USER/temp/full_length2.fastq.gz
Submit this job using the swarm command.
swarm -f pychopper.swarm -g 10 -t 6 --module pychopper/2.7.10where
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module pychopper | Loads the pychopper module for each subjob in the swarm |