FCS is a toolset to elimate contaminant sequences from a genome assembly. Currently, there are two tools in this toolset: fcs-adaptor
and fcs-gx
.
${FCS_TEST_DATA}
Allocate an interactive session and run the program. From the FCS tutorials at the FCS Github site:
[user@biowulf]$ sinteractive --mem=3g --cpus-per-task=4 --gres=lscratch:100 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn4224 are ready for job [user@cn4224 ~]$ module load fcs [+] Loading fcs 0.4.0 on cn4285 [+] Loading singularity 3.10.5 on cn4285 [user@cn4244 ~]$ cd /lscratch/${SLURM_JOB_ID} [user@cn4224 ~]$ cp ${FCS_TEST_DATA}/fcsadaptor_prok_test.fa.gz inputdir/. [user@cn4224 ~]$ mkdir inputdir outputdir [user@cn4224 ~]$ run_fcsadaptor.sh --fasta-input ./inputdir/fcsadaptor_prok_test.fa.gz --output-dir ./outputdir --prok --container-engine singularity --image ${FCS_HOME}/fcs-adaptor.sif [WARN tini (2647065)] Tini is not running as PID 1 and isn't registered as a child subreaper. Zombie processes will not be re-parented to Tini, so zombie reaping won't work. To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1. Output will be placed in: /output-volume Executing the workflow Resolved '/app/fcs/progs/ForeignContaminationScreening.cwl' to 'file:///app/fcs/progs/ForeignContaminationScreening.cwl' [workflow ] start [workflow ] starting step ValidateInputSequences [step ValidateInputSequences] start [..] [job all_skipped_trims] completed success [step all_skipped_trims] completed success [workflow ] starting step all_cleaned_fasta [step all_cleaned_fasta] start [step all_cleaned_fasta] completed success [workflow ] completed success [user@cn4224 ~]$ cp ${FCS_TEST_DATA}/fcsgx_test.fa.gz . [user@cn4224 ~]$ SOURCE_DB_MANIFEST="https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/test-only/test-only.manifest" [user@cn4224 ~]$ LOCAL_DB=/lscratch/${SLURM_JOB_ID}/gxdb [user@cn4224 ~]$ fcs.py db get --mft "$SOURCE_DB_MANIFEST" --dir "$LOCAL_DB/test-only" [user@cn4224 ~]$ fcs.py db check --mft "$SOURCE_DB_MANIFEST" --dir "$LOCAL_DB/gxdb" =============================================================================== Source: https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/test-only Destination: /app/db/gxdb Resuming failed transfer in /app/db/gxdb... Space check: Available:15.71GiB; Existing:0B; Incoming:4.29GiB; Delta:4.29GiB Requires transfer: 56B test-only.meta.jsonl Requires transfer: 5.92KiB test-only.taxa.tsv Requires transfer: 21.33KiB test-only.seq_info.tsv.gz Requires transfer: 7.85MiB test-only.blast_div.tsv.gz Requires transfer: 67.56MiB test-only.gxs Requires transfer: 4.21GiB test-only.gxi [user@cn4224 ~]$ GXDB_LOC=/lscratch/${SLURM_JOB_ID}/gxdb [user@cn4224 ~]$ fcs.py screen genome --fasta ./fcsgx_test.fa.gz --out-dir ./gx_out/ --gx-db "$GXDB_LOC/test-only" --tax-id 6973 -------------------------------------------------------------------- tax-id : 6973 fasta : /sample-volume/fcsgx_test.fa.gz size : 8.55 MiB split-fa : True BLAST-div : roaches gx-div : anml:insects w/same-tax: True bin-dir : /app/bin gx-db : /app/db/gxdb/test-only/test-only.gxi gx-ver : Mar 10 2023 15:34:33; git:v0.4.0-3-g8096f62 output : /output-volume//fcsgx_test.fa.6973.taxonomy.rpt -------------------------------------------------------------------- [...] fcs_gx_report.txt contamination summary: ---------------------------------------- seqs bases ----- ---------- TOTAL 243 27170378 ----- ----- ---------- prok:CFB group bacteria 243 27170378 -------------------------------------------------------------------- fcs_gx_report.txt action summary: --------------------------------- seqs bases ----- ---------- TOTAL 243 27170378 ----- ----- ---------- EXCLUDE 214 25795430 REVIEW 29 1374948 -------------------------------------------------------------------- [user@cn4224 ~]$ zcat fcsgx_test.fa.gz | fcs.py clean genome --action-report ./gx_out/fcsgx_test.fa.6973.fcs_gx_report.txt --output clean.fasta --contam-fasta-out contam.fasta Applied 214 actions; 25795430 bps dropped; 0 bps hardmasked. [user@cn4224 ~]$ ls gx_out fcsgx_test.fa.6973.fcs_gx_report.txt fcsgx_test.fa.6973.taxonomy.rpt
Create a batch input file (e.g. fcsadaptor.sh) similar to the following.
#! /bin/bash module load fcs cd /lscratch/${SLURM_JOB_ID} mkdir inputdir outputdir cp ${FCS_TEST_DATA}/fcsadaptor_prok_test.fa.gz inputdir/. run_fcsadaptor.sh --fasta-input ./inputdir/fcsadaptor_prok_test.fa.gz --output-dir ./outputdir --prok --container-engine singularity --image ${FCS_HOME}/fcs-adaptor.sif cp -r outputdir /data/${USER}/.
Submit these jobs using the Slurm sbatch command.
Create a swarmfile to run fcs-adaptor (e.g. fcsadaptor.swarm). For example:
run_fcsadaptor.sh --fasta-input ./inputdir/fcsadaptor_prok_test1.fa.gz --output-dir ./outputdir1 --prok --container-engine singularity --image ${FCS_HOME}/fcs-adaptor.sif; cp -r outputdir1 /data/${USER}/. run_fcsadaptor.sh --fasta-input ./inputdir/fcsadaptor_prok_test2.fa.gz --output-dir ./outputdir2 --prok --container-engine singularity --image ${FCS_HOME}/fcs-adaptor.sif; cp -r outputdir2 /data/${USER}/. run_fcsadaptor.sh --fasta-input ./inputdir/fcsadaptor_prok_test3.fa.gz --output-dir ./outputdir3 --prok --container-engine singularity --image ${FCS_HOME}/fcs-adaptor.sif; cp -r outputdir3 /data/${USER}/.
Submit this job using the swarm command.
swarm -f fcsadaptor.swarm [-g #] --module fcswhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module fcs | Loads the fcs module for each subjob in the swarm |