bioBakery is a meta’omic analysis environment and collection of individual software tools with the capacity to process raw shotgun sequencing data into actionable microbial community feature profiles, summary reports, and publication-ready figures. It includes a collection of preconfigured analysis modules also joined into workflows for reproducibility. Each individual module has been developed to perform a particular task, e.g. quantitative taxonomic profiling or statistical analysis.
[user@biowulf]$ sinteractive [user@cn0861 ~]$ module load biobakery_workflows [+] Loading samtools 1.15.1 ... [+] Loading bowtie 2-2.4.5 [+] Loading trimmomatic 0.39 on cn4276 [+] Loading singularity 3.10.5 on cn4276 [+] Loading metaphlan 4.0.3 [+] Loading biobakery_workflows 3.1 ...All workflows follow the general command format:
biobakery_workflows $WORKFLOW --input $INPUT --output $OUTPUTwhere $WORKFLOW is one of:
16s, 16s_vis, isolate_assembly, wmgx, wmgx_vis, wmgx_wmtx, wmgx_wmtx_visThe basic usage of the biobakery_workflows executable is as follows:
[user@cn0861 ~]$ biobakery_workflows -h
usage: biobakery_workflows [-h] [--version]
{16s,16s_vis,isolate_assembly,wmgx,wmgx_vis,wmgx_wmtx,wmgx_wmtx_vis}
bioBakery workflows: A collection of AnADAMA2 workflows
positional arguments:
{16s,16s_vis,isolate_assembly,wmgx,wmgx_vis,wmgx_wmtx,wmgx_wmtx_vis}
workflow to run
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
To see the usage of biobakery_workflows for a particular workflow, say, "wmgx", enter the command:
[user@cn0861 ~]$ biobakery_workflows wmgx -h
usage: wmgx.py [-h] [--version]
[--input-extension {fastq.gz,fastq,fq.gz,fq,fasta,fasta.gz,fastq.bz2,fq.bz2}]
[--barcode-file BARCODE_FILE]
[--dual-barcode-file DUAL_BARCODE_FILE]
[--index-identifier INDEX_IDENTIFIER]
[--min-pred-qc-score MIN_PRED_QC_SCORE] [--threads THREADS]
[--pair-identifier PAIR_IDENTIFIER] [--interleaved]
[--bypass-quality-control]
[--contaminate-databases CONTAMINATE_DATABASES]
[--qc-options QC_OPTIONS]
[--functional-profiling-options FUNCTIONAL_PROFILING_OPTIONS]
[--remove-intermediate-output] [--bypass-functional-profiling]
[--bypass-strain-profiling] [--run-strain-gene-profiling]
[--bypass-taxonomic-profiling] [--run-assembly]
[--strain-profiling-options STRAIN_PROFILING_OPTIONS]
[--taxonomic-profiling-options TAXONOMIC_PROFILING_OPTIONS]
[--max-strains MAX_STRAINS] [--strain-list STRAIN_LIST]
[--assembly-options ASSEMBLY_OPTIONS] -o OUTPUT [-i INPUT]
[--config CONFIG] [--local-jobs JOBS] [--grid-jobs GRID_JOBS]
[--grid GRID] [--grid-partition GRID_PARTITION]
[--grid-benchmark {on,off}] [--grid-options GRID_OPTIONS]
[--grid-environment GRID_ENVIRONMENT]
[--grid-scratch GRID_SCRATCH] [--dry-run] [--skip-nothing]
[--quit-early] [--until-task UNTIL_TASK]
[--exclude-task EXCLUDE_TASK] [--target TARGET]
[--exclude-target EXCLUDE_TARGET]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
A workflow for whole metagenome shotgun sequences
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--input-extension {fastq.gz,fastq,fq.gz,fq,fasta,fasta.gz,fastq.bz2,fq.bz2}
the input file extension
[default: fastq.gz]
--barcode-file BARCODE_FILE
the barcode file
[default: ]
--dual-barcode-file DUAL_BARCODE_FILE
the string to identify the dual barcode file
[default: ]
--index-identifier INDEX_IDENTIFIER
the string to identify the index files
[default: _I1_001]
--min-pred-qc-score MIN_PRED_QC_SCORE
the min phred quality score to use for demultiplexing
[default: 2]
--threads THREADS number of threads/cores for each task to use
[default: 1]
--pair-identifier PAIR_IDENTIFIER
the string to identify the first file in a pair
[default: .R1]
--interleaved indicates whether or not sequence files are interleaved
[default: False]
--bypass-quality-control
do not run the quality control tasks
--contaminate-databases CONTAMINATE_DATABASES
the path (or comma-delimited paths) to the contaminate
reference databases for QC
[default: /fdb/biobakery_workflows_databases/kneaddata_db_human_genome]
--qc-options QC_OPTIONS
additional options when running the QC step
[default: ]
--functional-profiling-options FUNCTIONAL_PROFILING_OPTIONS
additional options when running the functional profiling step
[default: ]
--remove-intermediate-output
remove intermediate output files
--bypass-functional-profiling
do not run the functional profiling tasks
--bypass-strain-profiling
do not run the strain profiling tasks (StrainPhlAn)
--run-strain-gene-profiling
run the gene-based strain profiling tasks (PanPhlAn)
--bypass-taxonomic-profiling
do not run the taxonomic profiling tasks (a tsv profile for each sequence file must be included in the input folder using the same sample name)
--run-assembly run the assembly and annotation tasks
--strain-profiling-options STRAIN_PROFILING_OPTIONS
additional options when running the strain profiling step
[default: ]
--taxonomic-profiling-options TAXONOMIC_PROFILING_OPTIONS
additional options when running the taxonomic profiling step
[default: ]
--max-strains MAX_STRAINS
the max number of strains to profile
[default: 20]
--strain-list STRAIN_LIST
input file with list of strains to profile
[default: ]
--assembly-options ASSEMBLY_OPTIONS
additional options when running the assembly step
[default: ]
-o OUTPUT, --output OUTPUT
Write output to this directory
-i INPUT, --input INPUT
Find inputs in this directory
[default: /gpfs/gsfs7/users/$USER/biobakery_workflows]
--config CONFIG Find workflow configuration in this folder
[default: only use command line options]
--local-jobs JOBS Number of tasks to execute in parallel locally
[default: 1]
--grid-jobs GRID_JOBS
Number of tasks to execute in parallel on the grid
[default: 0]
--grid GRID Run gridable tasks on this grid type
[default: slurm]
--grid-partition GRID_PARTITION
Partition/queue used for gridable tasks.
Provide a single partition or a comma-delimited list
of short/long partitions with a cutoff.
[default: serial_requeue,shared,240]
--grid-benchmark {on,off}
Benchmark gridable tasks
[default: on]
--grid-options GRID_OPTIONS
Grid specific options that will be applied to each grid task
--grid-environment GRID_ENVIRONMENT
Commands that will be run before each grid task to set up environment
--grid-scratch GRID_SCRATCH
The folder to write intermediate scratch files for grid jobs
--dry-run Print tasks to be run but don't execute their actions
--skip-nothing Run all tasks. Rerun tasks that have already been run.
--quit-early Stop if a task fails. By default,
all tasks (except sub-tasks of failed tasks) will run.
--until-task UNTIL_TASK
Stop after running this task. Use task name or number.
--exclude-task EXCLUDE_TASK
Don't run these tasks. Add multiple times to append.
--target TARGET Only run tasks that generate these targets.
Add multiple times to append.
Patterns with ? and * are allowed.
--exclude-target EXCLUDE_TARGET
Don't run tasks that generate these targets.
Add multiple times to append.
Patterns with ? and * are allowed.
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set the level of output for the log
[default: INFO]
This output indicates, in particular, that the biobakery executable can be run with multiple threads (option --threads THREADS) if an appropriate number of cpus is allocated for the interactive session, which should speed up your processing. [user@cn0861 ~]$ ls $BW_BIN 16s.py merge_and_rename_fastq.py 16s_vis.py merge_fastq.py anadama2_add_files_to_database.py merge_paired_ends.R annotate_genome.py phylogeny.R assign_taxonomy.R pull_out_reads_by_species_metaphlan2_results.py biobakery_workflows python burst_workflow.py R check_fastq_format.py remove_if_exists.py const_seq_table.R rename_data_products.py count_features.py rename_fastq_files.py create_fasta_per_taxonomy_from_alignments.py rename_files_to_sample_ids.py create_otu_tables_from_alignments.py reverse_compliment_barcodes.py create_subsampled_demos.py rna_dna_norm.py dada2_version.R shell demultiplex_split_index.py strainphlan_ggtree_vis.R extract_orphan_reads.py strainphlan_ordination_vis.R filter_and_trim.R trim_taxonomy.py generate_dual_barcode.py wmgx.py get_counts_from_humann2_logs.py wmgx_vis.py identify_primers.R wmgx_wmtx.py isolate_assembly.py.py wmgx_wmtx_vis.py learn_error_rates.RIn order to run biobakery_workflows on sample fastq or fastq.gz data (single-end or paired-end), enter the following commands:
[user@cn0861 ~]$ biobakery_workflows wmgx --input $BW_EXAMPLES/wmgx/single/ --output /data/$USER/workflow_output
(Apr 28 09:03:50) [ 0/52 - 0.00%] **Ready ** Task 3: kneaddata____demo2
(Apr 28 09:03:50) [ 0/52 - 0.00%] **Started ** Task 3: kneaddata____demo2
(Apr 28 09:03:58) [ 1/52 - 1.92%] **Completed** Task 3: kneaddata____demo2
(Apr 28 09:03:58) [ 1/52 - 1.92%] **Ready ** Task 8: metaphlan____demo2
(Apr 28 09:03:58) [ 1/52 - 1.92%] **Started ** Task 8: metaphlan____demo2
(Apr 28 09:05:41) [ 2/52 - 3.85%] **Completed** Task 8: metaphlan____demo2
(Apr 28 09:05:41) [ 2/52 - 3.85%] **Ready ** Task 34: strainphlan_sample2markers____demo2
(Apr 28 09:05:41) [ 2/52 - 3.85%] **Started ** Task 34: strainphlan_sample2markers____demo2
(Apr 28 09:06:18) [ 3/52 - 5.77%] **Completed** Task 34: strainphlan_sample2markers____demo2
(Apr 28 09:06:18) [ 3/52 - 5.77%] **Ready ** Task 13: humann____demo2
(Apr 28 09:06:18) [ 3/52 - 5.77%] **Started ** Task 13: humann____demo2
...
(Apr 28 09:12:25) [49/52 - 94.23%] **Completed** Task 24: humann_renorm_pathways_relab____demo1
(Apr 28 09:12:25) [49/52 - 94.23%] **Ready ** Task 28: humann_join_tables_pathways_relab
(Apr 28 09:12:25) [49/52 - 94.23%] **Started ** Task 28: humann_join_tables_pathways_relab
(Apr 28 09:12:25) [50/52 - 96.15%] **Completed** Task 28: humann_join_tables_pathways_relab
(Apr 28 09:12:25) [50/52 - 96.15%] **Ready ** Task 31: humann_count_features_pathways
(Apr 28 09:12:25) [50/52 - 96.15%] **Started ** Task 31: humann_count_features_pathways
(Apr 28 09:12:25) [51/52 - 98.08%] **Completed** Task 31: humann_count_features_pathways
(Apr 28 09:12:25) [51/52 - 98.08%] **Ready ** Task 32: humann_merge_feature_counts
(Apr 28 09:12:25) [51/52 - 98.08%] **Started ** Task 32: humann_merge_feature_counts
(Apr 28 09:12:25) [52/52 - 100.00%] **Completed** Task 32: humann_merge_feature_counts
Run Finished
[user@cn0861 ~]$ tree /data/user/workflow_output
/data/user/workflow_output
├── anadama.log
├── humann
│ ├── counts
│ │ ├── humann_ecs_relab_counts.tsv
│ │ ├── humann_feature_counts.tsv
│ │ ├── humann_genefamilies_relab_counts.tsv
│ │ ├── humann_pathabundance_relab_counts.tsv
│ │ └── humann_read_and_species_count_table.tsv
│ ├── main
│ │ ├── demo1_genefamilies.tsv
│ │ ├── demo1_humann_temp
│ │ │ ├── demo1_bowtie2_aligned.sam
│ │ │ ├── demo1_bowtie2_aligned.tsv
│ │ │ ├── demo1_bowtie2_index.1.bt2
│ │ │ ├── demo1_bowtie2_index.2.bt2
│ │ │ ├── demo1_bowtie2_index.3.bt2
│ │ │ ├── demo1_bowtie2_index.4.bt2
│ │ │ ├── demo1_bowtie2_index.rev.1.bt2
│ │ │ ├── demo1_bowtie2_index.rev.2.bt2
│ │ │ ├── demo1_bowtie2_unaligned.fa
│ │ │ ├── demo1_custom_chocophlan_database.ffn
│ │ │ ├── demo1_diamond_aligned.tsv
│ │ │ └── demo1_diamond_unaligned.fa
│ │ ├── demo1.log
│ │ ├── demo1_pathabundance.tsv
│ │ ├── demo1_pathcoverage.tsv
│ │ ├── demo2_genefamilies.tsv
│ │ ├── demo2_humann_temp
│ │ │ ├── demo2_bowtie2_aligned.sam
│ │ │ ├── demo2_bowtie2_aligned.tsv
│ │ │ ├── demo2_bowtie2_index.1.bt2
│ │ │ ├── demo2_bowtie2_index.2.bt2
│ │ │ ├── demo2_bowtie2_index.3.bt2
│ │ │ ├── demo2_bowtie2_index.4.bt2
│ │ │ ├── demo2_bowtie2_index.rev.1.bt2
│ │ │ ├── demo2_bowtie2_index.rev.2.bt2
│ │ │ ├── demo2_bowtie2_unaligned.fa
│ │ │ ├── demo2_custom_chocophlan_database.ffn
│ │ │ ├── demo2_diamond_aligned.tsv
│ │ │ └── demo2_diamond_unaligned.fa
│ │ ├── demo2.log
│ │ ├── demo2_pathabundance.tsv
│ │ └── demo2_pathcoverage.tsv
│ ├── merged
│ │ ├── ecs_relab.tsv
│ │ ├── ecs.tsv
│ │ ├── genefamilies_relab.tsv
│ │ ├── genefamilies.tsv
│ │ ├── pathabundance_relab.tsv
│ │ └── pathabundance.tsv
│ ├── regrouped
│ │ ├── demo1_ecs.tsv
│ │ └── demo2_ecs.tsv
│ └── relab
│ ├── ecs
│ │ ├── demo1_ecs_relab.tsv
│ │ └── demo2_ecs_relab.tsv
│ ├── genes
│ │ ├── demo1_genefamilies_relab.tsv
│ │ └── demo2_genefamilies_relab.tsv
│ └── pathways
│ ├── demo1_pathabundance_relab.tsv
│ └── demo2_pathabundance_relab.tsv
├── kneaddata
│ ├── main
│ │ ├── demo1.fastq
│ │ ├── demo1_Homo_sapiens_bowtie2_contam.fastq
│ │ ├── demo1.log
│ │ ├── demo1.trimmed.fastq
│ │ ├── demo2.fastq
│ │ ├── demo2_Homo_sapiens_bowtie2_contam.fastq
│ │ ├── demo2.log
│ │ └── demo2.trimmed.fastq
│ └── merged
│ └── kneaddata_read_count_table.tsv
├── metaphlan
│ ├── main
│ │ ├── demo1_bowtie2.sam
│ │ ├── demo1_taxonomic_profile.tsv
│ │ ├── demo2_bowtie2.sam
│ │ └── demo2_taxonomic_profile.tsv
│ └── merged
│ ├── metaphlan_species_counts_table.tsv
│ └── metaphlan_taxonomic_profiles.tsv
└── strainphlan
├── 0_clade.log
├── 0_clade.tree
├── 10_clade.log
├── 10_clade.tree
├── 11_clade.log
...
├── 9_clade.log
├── 9_clade.tree
├── clades_list_order_by_average_abundance.txt
├── clades_list.txt
├── demo1_bowtie2
│ └── demo1_bowtie2.pkl
└── demo2_bowtie2
└── demo2_bowtie2.pkl
[user@cn0861 ~]$ biobakery_workflows wmgx --input $BW_EXAMPLES/wmgx/paired/ --output /data/$USER/workflow_output
(Apr 28 09:14:52) [ 0/52 - 0.00%] **Ready ** Task 4: kneaddata____demo2
(Apr 28 09:14:52) [ 0/52 - 0.00%] **Started ** Task 4: kneaddata____demo2
(Apr 28 09:15:05) [ 1/52 - 1.92%] **Completed** Task 4: kneaddata____demo2
(Apr 28 09:15:05) [ 1/52 - 1.92%] **Ready ** Task 10: metaphlan____demo2
(Apr 28 09:15:05) [ 1/52 - 1.92%] **Started ** Task 10: metaphlan____demo2
(Apr 28 09:16:49) [ 2/52 - 3.85%] **Completed** Task 10: metaphlan____demo2
(Apr 28 09:16:49) [ 2/52 - 3.85%] **Ready ** Task 36: strainphlan_sample2markers____demo2
(Apr 28 09:16:49) [ 2/52 - 3.85%] **Started ** Task 36: strainphlan_sample2markers____demo2
...
(Apr 28 09:27:03) [46/52 - 88.46%] **Completed** Task 22: humann_renorm_genes_relab____demo1
(Apr 28 09:27:03) [46/52 - 88.46%] **Ready ** Task 28: humann_join_tables_genes_relab
(Apr 28 09:27:03) [46/52 - 88.46%] **Started ** Task 28: humann_join_tables_genes_relab
(Apr 28 09:27:03) [47/52 - 90.38%] **Completed** Task 28: humann_join_tables_genes_relab
(Apr 28 09:27:03) [47/52 - 90.38%] **Ready ** Task 31: humann_count_features_genes
(Apr 28 09:27:03) [47/52 - 90.38%] **Started ** Task 31: humann_count_features_genes
(Apr 28 09:27:03) [48/52 - 92.31%] **Completed** Task 31: humann_count_features_genes
(Apr 28 09:27:03) [48/52 - 92.31%] **Ready ** Task 26: humann_renorm_pathways_relab____demo1
(Apr 28 09:27:03) [48/52 - 92.31%] **Started ** Task 26: humann_renorm_pathways_relab____demo1
(Apr 28 09:27:03) [49/52 - 94.23%] **Completed** Task 26: humann_renorm_pathways_relab____demo1
(Apr 28 09:27:03) [49/52 - 94.23%] **Ready ** Task 30: humann_join_tables_pathways_relab
(Apr 28 09:27:03) [49/52 - 94.23%] **Started ** Task 30: humann_join_tables_pathways_relab
(Apr 28 09:27:03) [50/52 - 96.15%] **Completed** Task 30: humann_join_tables_pathways_relab
(Apr 28 09:27:03) [50/52 - 96.15%] **Ready ** Task 33: humann_count_features_pathways
(Apr 28 09:27:03) [50/52 - 96.15%] **Started ** Task 33: humann_count_features_pathways
(Apr 28 09:27:03) [51/52 - 98.08%] **Completed** Task 33: humann_count_features_pathways
(Apr 28 09:27:03) [51/52 - 98.08%] **Ready ** Task 34: humann_merge_feature_counts
(Apr 28 09:27:03) [51/52 - 98.08%] **Started ** Task 34: humann_merge_feature_counts
(Apr 28 09:27:03) [52/52 - 100.00%] **Completed** Task 34: humann_merge_feature_counts
Run Finished