The DeconSeq tool can be used to automatically detect and efficiently remove sequence contamination from genomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=8g [user@cn3316 ~]$ module load DeconSeq1) Using a modified version of BWA, create index databases to be used for contaminat screening. In this example, human and mouse genome databases will be created and stored in the folder db_dir:
[user@cn3316 ~]$ mkdir db_dir [user@cn3316 ~]$ ln -s $DECONSEQ_DATA/hg19_genome.fa [user@cn3316 ~]$ bwa64 index -p hg19_db -a bwtsw hg19_genome.fa >out.txt 2>&1 & [user@cn3316 ~]$ mv hg19_db.* db_dir [user@cn3316 ~]$ ln -s $DECONSEQ_DATA/mm10_genome.fa [user@cn3316 ~]$ bwa64 index -p mm10_db -a bwtsw mm10_genome.fa >out.txt 2>&1 & [user@cn3316 ~]$ mv mm10_db.* db_dir [user@cn3316 ~]$ ls db_dir hg19_db.amb hg19_db.rbwt mm10_db.amb mm10_db.rbwt hg19_db.ann hg19_db.rpac mm10_db.ann mm10_db.rpac hg19_db.bwt hg19_db.rsa mm10_db.bwt mm10_db.rsa hg19_db.pac hg19_db.sa mm10_db.pac mm10_db.sa2) Download the configuration file DeconSeqConfig.pm from $DECONSEQ_SRC and modify it as needed, or simply download from $DECONSEQ_DATA the file already modified for the human and mouse databases:
[user@cn3316 ~]$ cp $DECONSEQ_DATA/DeconSeqConfig.pm .3) Get/prepare sample data file(s). In this small example, idata from first 1000 reads of the file SRR4254643_mouse.fasta will be used:
[user@cn3316 ~]$ ls -s $DECONSEQ_DATA/SRR4254643_mouse.fasta [user@cn3316 ~]$ head -n 1000 SRR4254643_mouse.fasta > sample.fasta4) Finally, run deconseq on the data file and get the results:
[user@cn3316 ~]$ deconseq -f test.fasta -dbs hsref -dbs_retain mouseIt order to view other available deconseq commands, type:
[user@cn3316 ~]$ deconseq -h
In order to process a large data file on cluster, first split it into smaller chanks, e.g.
[user@cn3316 ~]$ splitFasta -verbose -i SRR4254643_mouse.fasta -s 2 #chunks of 2MB [user@cn3316 ~]$ splitFasta -verbose -i SRR4254643_mouse.fasta -n 10 #10 chunksand then run the deconseq command separately on each the chunk.
To this end, create a batch input file, e.g. deconseq.sh:
#!/bin/bash module load DeconSeq ln -s $DECONSEQ_DATA/db_dir ln -s $DECONSEQ_DATA/SRR4254643_mouse.fasta splitFasta -verbose -i SRR4254643_mouse.fasta -n 10 deconseq -f SRR4254643_mouse.fasta_c1.fasta -dbs hsref -dbs_retain mouse deconseq -f SRR4254643_mouse.fasta_c2.fasta -dbs hsref -dbs_retain mouse ... deconseq -f SRR4254643_mouse.fasta_c10.fasta -dbs hsref -dbs_retain mouse
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] deconseq.sh