A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance.
Features
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load fastp [user@cn3144 ~]$ cp $FASTP_TEST_DATA/* . [user@cn3144 ~]$ fastp --in1 R1.fq --in2 R2.fq Read1 before filtering: total reads: 9 total bases: 1208 Q20 bases: 1078(89.2384%) Q30 bases: 1005(83.1954%) Read2 before filtering: total reads: 9 total bases: 1359 Q20 bases: 1100(80.9419%) Q30 bases: 959(70.5666%) Read1 after filtering: total reads: 8 total bases: 1208 Q20 bases: 1078(89.2384%) Q30 bases: 1005(83.1954%) Read2 aftering filtering: total reads: 8 total bases: 1208 Q20 bases: 991(82.0364%) Q30 bases: 874(72.351%) Filtering result: reads passed filter: 16 reads failed due to low quality: 0 reads failed due to too many N: 0 reads failed due to too short: 2 reads with adapter trimmed: 0 bases trimmed due to adapters: 0 Duplication rate: 62.5% Insert size peak (evaluated by paired-end reads): 187 JSON report: fastp.json HTML report: fastp.html fastp --in1 R1.fq --in2 R2.fq fastp v0.20.1, time used: 0 seconds [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. fastp.sh). For example:
#!/bin/bash set -e module load fastp fastp --in1 R1.fq --in2 R2.fq
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=12 --mem=2g fastp.sh
Create a swarmfile (e.g. fastp.swarm). For example:
cd dir1;fastp --in1 R1.fq --html R1.html cd dir2;fastp --in1 R2.fq --html R2.html cd dir3;fastp --in1 R3.fq --html R3.html cd dir4;fastp --in1 R4.fq --html R4.html
Submit this job using the swarm command.
swarm -f fastp.swarm [-g #] [-t #] --module fastpwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module fastp | Loads the fastp module for each subjob in the swarm |