DNAnexus is a cloud-based commercial solution for next-generation sequence analysis and visualization. It has a command-line interface (CLI) which can be used to log in to the DNAnexus platform, upload and navigate data, and launch analyses. For users who wish to upload/download data directly from the NIH HPC systems, the CLI has been installed on Biowulf
DNAnexus cannot be run on the Biowulf login node. Since DNAnexus CLI commands are either doing data transfer or remote jobs, they can be safely run on Helix (the interactive data transfer system where scientific applications are not available), or you can allocate an interactive session on Biowulf by typing 'sinteractive'. The example below shows an interactive DNAnexus session on Helix, but an interactive session on a Biowulf compute node would be identical.
helix% module load DNAnexus helix% dx login Acquiring credentials from https://auth.dnanexus.com Username [user]: Password: Using env variable HTTPS_PROXY=http://dtn04-e0:3128 as proxy Note: Use dx select --level VIEW or dx select --public to select from projects for which you only have VIEW permissions. Available projects (CONTRIBUTE or higher): 0) Test (ADMINISTER) Pick a numbered choice [0]: 0 Setting current project to: Test You are now logged in. Your credentials are stored in /home/user/.dnanexus_config and will expire in 30 days, 0:00:00. Use dx login --timeout to control the expiration date, or dx logout to end this session. helix% dx select Available projects (CONTRIBUTE or higher): 0) Test (ADMINISTER) Pick a numbered choice [0]: 0 Setting current project to: Test helix% dx upload --wait /data/$USER/fastq_files/8_2_61PKFAAXX.fq.gz [===========================================================>] Uploaded 257,945,911 of 257,945,911 bytes (100%) /data/CloudData/sean/fastq_files/8_2_61PKFAAXX.fq.gz ID file-FJzQ1b003PyyyjVbBVq84xKK Class file Project project-FJzPQV003PyqqVx409PJBgZ0 Folder / Name 8_2_61PKFAAXX.fq.gz State closed Visibility visible Types - Properties - Tags - Outgoing links - Created Thu Aug 23 14:45:40 2018 Created by user Last modified Thu Aug 23 14:47:04 2018 archivalState "live" Media type application/gzip Size 246.00 MB helix% dx find apps --installed x Bowtie2 FASTA Indexer (bowtie2_fasta_indexer), v1.3.0 x BWA FASTA Indexer (bwa_fasta_indexer), v2.0.1 x FastQC Reads Quality Control (fastqc), v2.3.0 helix% dx run fastqc Entering interactive mode for input selection. Input: Reads (reads) Class: file Enter file ID or path (twice for compatible files in current directory, '?' for more options) reads: 8_2_61PKFAAXX.fq.gz Select an optional parameter to set by its # (^D or to finish): [0] Custom contaminants (contaminants_txt) [1] Custom adapters (adapters_txt) [2] Custom limits (limits_txt) [3] Input format (format) [default="auto"] [4] Kmer size (kmer_size) [default=7] [5] Disable grouping of bases past 50bp? (nogroup) [default=true] [6] Extra command-line options (extra_options) Optional param #: Using input JSON: { "reads": { "$dnanexus_link": { "project": "project-FJzPQV003PyqqVx409PJBgZ0", "id": "file-FJzQ1b003PyyyjVbBVq84xKK" } } } Confirm running the executable with this input [Y/n]: y Calling app-FBFjGpQ987vZKJXvF1zfPx8X with output destination project-FJzPQV003PyqqVx409PJBgZ0:/ Job ID: job-FJzQ6K003PyY8BbgBV4qxXYF Watch launched job now? [Y/n] y Job Log ------- Watching job job-FJzQ6K003PyY8BbgBV4qxXYF. Press Ctrl+C to stop. * FastQC Reads Quality Control (fastqc:main) (running) job-FJzQ6K003PyY8BbgBV4qxXYF user 2018-08-23 14:55:52 (running for 0:00:39) 2018-08-23 14:56:36 FastQC Reads Quality Control INFO Logging initialized (priority) 2018-08-23 14:56:37 FastQC Reads Quality Control INFO CPU: 12% (4 cores) * Memory: 661/7478MB * Storage: 79GB free * Net: 0?/0?MBps 2018-08-23 14:56:37 FastQC Reads Quality Control STDOUT dxpy/0.259.0 (Linux-4.4.0-98-generic-x86_64-with-Ubuntu-14.04-trusty) 2018-08-23 14:56:38 FastQC Reads Quality Control INFO Downloading bundled file resources.tar.gz 2018-08-23 14:56:39 FastQC Reads Quality Control STDOUT >>> Unpacking resources.tar.gz to / 2018-08-23 14:56:39 FastQC Reads Quality Control INFO Installing apt packages openjdk-7-jre-headless 2018-08-23 14:56:59 FastQC Reads Quality Control STDOUT dxpy/0.259.0 (Linux-4.4.0-98-generic-x86_64-with-Ubuntu-14.04-trusty) 2018-08-23 14:56:59 FastQC Reads Quality Control STDOUT /usr/sbin/sshd already running. 2018-08-23 14:56:59 FastQC Reads Quality Control STDOUT bash running (job ID job-FJzQ6K003PyY8BbgBV4qxXYF) 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ mark-section 'streaming input data' 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ mkfifo ./8_2_61PKFAAXX.fq.gz 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ cat_pid=3235 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ dx cat '{"$dnanexus_link": "file-FJzQ1b003PyyyjVbBVq84xKK"}' 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ mark-section 'running FastQC' 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ opts= 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ '[' '' '!=' '' ']' 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ '[' '' '!=' '' ']' 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ '[' '' '!=' '' ']' 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ '[' auto '!=' auto ']' 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ '[' true == true ']' 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ opts=' --nogroup' 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ mkdir results 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR +++ nproc 2018-08-23 14:57:00 FastQC Reads Quality Control STDERR ++ /FastQC/fastqc -t 4 --extract -k 7 -o results --nogroup ./8_2_61PKFAAXX.fq.gz 2018-08-23 14:57:01 FastQC Reads Quality Control STDERR Started analysis of 8_2_61PKFAAXX.fq.gz 2018-08-23 14:57:30 FastQC Reads Quality Control STDOUT Analysis complete for 8_2_61PKFAAXX.fq.gz 2018-08-23 14:57:34 FastQC Reads Quality Control STDERR ++ wait 3235 2018-08-23 14:57:34 FastQC Reads Quality Control STDERR ++ mark-section 'uploading results' 2018-08-23 14:57:34 FastQC Reads Quality Control STDERR ++ mkdir -p /home/dnanexus/out/report_html/ /home/dnanexus/out/stats_txt/ 2018-08-23 14:57:34 FastQC Reads Quality Control STDERR ++ mv results/8_2_61PKFAAXX.fq_fastqc/fastqc_report.html /home/dnanexus/out/report_html/8_2_61PKFAAXX.stats-fastqc.html 2018-08-23 14:57:34 FastQC Reads Quality Control STDERR ++ mv results/8_2_61PKFAAXX.fq_fastqc/fastqc_data.txt /home/dnanexus/out/stats_txt/8_2_61PKFAAXX.stats-fastqc.txt 2018-08-23 14:57:34 FastQC Reads Quality Control STDERR ++ dx-upload-all-outputs 2018-08-23 14:57:34 FastQC Reads Quality Control STDOUT uploading file: /home/dnanexus/out/report_html/8_2_61PKFAAXX.stats-fastqc.html -> /8_2_61PKFAAXX.stats-fastqc.html 2018-08-23 14:57:35 FastQC Reads Quality Control STDOUT uploading file: /home/dnanexus/out/stats_txt/8_2_61PKFAAXX.stats-fastqc.txt -> /8_2_61PKFAAXX.stats-fastqc.txt 2018-08-23 14:57:35 FastQC Reads Quality Control STDERR ++ mark-success 2018-08-23 14:57:35 FastQC Reads Quality Control STDERR ++ type -t main 2018-08-23 14:57:35 FastQC Reads Quality Control STDERR + [[ '' == \f\u\n\c\t\i\o\n ]] 2018-08-23 14:57:35 FastQC Reads Quality Control STDERR + echo '/home/dnanexus/job-FJzQ6K003PyY8BbgBV4qxXYF: Global scope execution complete. Not invoking entry point function main because it was not found' 2018-08-23 14:57:35 FastQC Reads Quality Control STDERR /home/dnanexus/job-FJzQ6K003PyY8BbgBV4qxXYF: Global scope execution complete. Not invoking entry point function main because it was not found * FastQC Reads Quality Control (fastqc:main) (done) job-FJzQ6K003PyY8BbgBV4qxXYF user 2018-08-23 14:55:52 (runtime 0:01:07) Output: report_html = file-FJzQ77Q0Gjvj15Vz10y6JGXz stats_txt = file-FJzQ77j0Gjvq6kg010ZzZ6Z0 helix% dx ls 8_2_61PKFAAXX.fq.gz 8_2_61PKFAAXX.stats-fastqc.html 8_2_61PKFAAXX.stats-fastqc.txt helix%
The initial DNAnexus login cannot be performed as part of a Biowulf batch job. You will need to run 'dx login' on interactively on Helix or a Biowulf compute node. Once logged in, the login will be valid for 30 days. During that period it is possible to submit a batch job on Biowulf that runs DNAnexus commands.
Sample batch script
#!/bin/bash module load DNAnexus indir=/data/$USER/fastq_files infile=8_2_61PKFAAXX dx upload --wait ${indir}/${infile}.fq.gz dx ls dx run --wait fastqc -ireads=$infile dx ls dx download ${infile}.stats-fastqc.txt
Submit with:
sbatch myjob.shThe status of the job and any errors can be seen in the slurm-*.out file. DNAnexus will also send an email as below: