High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
BaseSpace CLI on Biowulf & Helix

Quick Links
Description

The Illumina BaseSPace Sequence Hub is a cloud based platform for analyzing data from Illumina sequencers. It directly integrates with sequecing machines to monitor runs and stream data to BaseSpace. Predefined pipelines can be used to analyze the data streamed from the sequencers or uploaded through another mechanism.

Storage and compute are provided by AWS.

BaseSpace Sequence Hub can be accessed through its web interface as well as through the command line interface (CLI) described here.

There may be multiple versions of BaseSpace CLI available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail basespace_cli 

To select a module use

module load basespace_cli/[version]

where [version] is the version of choice.

Environment variables set

Documentation

How to

The BaseSpace CLI provides an interactive interface with Illumina's BaseSpace. Actual computing and storage is done in the cloud. Therefore this will generally be used interactively. For this example we will use an interactive session. Before using the CLI for the first time, it will be necessary to authenticate. This will store credentials necessary to access a BaseSpace account in $HOME/.basespace. Visit the URL provided by bs authenticate to create the required access token

biowulf$ sinteractive
salloc.exe: Pending job allocation 21758857
[...snip...]
salloc.exe: Nodes cn2623 are ready for job
cn2623$ module load basespace_cli
[+] Loading basespace_cli 0.8.1
cn2623$ bs authenticate
please authenticate here:
https://basespace.illumina.com/oauth/device?code=XXXXX
...
Success!

Create a project:

cn2623$ bs list projects
# no projects yet
cn2623$ bs create project "TestProject"
cn2623$ bs list projects
+------------+--------------+
| project id | project name |
+------------+--------------+
| 31671652   | TestProject  |
+------------+--------------+

Upload an illumina generated sample. Note that samples have to pass validation rules:

cn2623$ bs upload sample --show-validation-rules
Fastq files are validated based on the following criteria:
    - The uploader will only support gzipped FASTQ files generated on Illumina instruments
    - The name of the FASTQ files must conform the following convention:
        SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz
        (i.e. SampleName_S1_L001_R1_001.fastq.gz / SampleName_S1_L001_R2_001.fastq.gz)
    - The read descriptor in the FASTQ files must conform to the following convention:
        @Instrument:RunID:FlowCellID:Lane:Tile:X:Y ReadNum:FilterFlag:0:SampleNumber
        - Read 1 descriptor would look like this:
        @M00900:62:000000000-A2CYG:1:1101:18016:2491 1:N:0:13
        - Read 2 would have a 2 in the ReadNum field, like this:
        @M00900:62:000000000-A2CYG:1:1101:18016:2491 2:N:0:13
Quality considerations
    - The number of base calls for each read must equal the number of quality scores
    - The number of entries for Read 1 must equal the number of entries for Read 2
    - The uploader will determine if files are paired-end based on the matching file 
      names in which the only difference is the ReadNum
    - For paired-end reads, the descriptor must match for every entry for
      both reads 1 and 2
    - Each read has passed filter
cn2623$ bs upload sample -p "TestProject" TestSample_S1_L001_R1_001.fastq.gz
Gathering metadata and validating fastq files...
Uploading ...
        TestSample_S1_L001_R1_001.fastq.gz ..... complete 
Uploaded sample with ID: 38322428
Uploaded by #### ####, using BaseSpaceCLI.SampleUpload/0.8.1 v0.8.1 on biowulf.nih.gov
cn2623$ bs list samples --project-name TestProject
+-----------+-------------+
| sample id | sample name |
+-----------+-------------+
| 38322428  | TestSample  |
+-----------+-------------+

Some applications are set up by default. Others have to be imported. The easiest way to do this is based on a previous run of the application. Here we import the SRA importer application.

cn2623$ bs list apps -C appid -C appname
+---------+----------------------------------+
| appid   | appname                          |
+---------+----------------------------------+
| 279279  | BWA Whole Genome Sequencing v1.0 |
| 544544  | TopHat Alignment                 |
| 1825824 | Isaac Whole Genome Sequencing    |
| 408408  | Cufflinks Assembly & DE          |
+---------+----------------------------------+
cn2623$ bs list appsessions -u Complete
+---------------+--------------------------------+-------------------+
| appsession id | appsession name                | appsession status |
+---------------+--------------------------------+-------------------+
| 36554570      | SRA Import                     | Complete          |
+---------------+--------------------------------+-------------------+
cn2623$ bs import app -a 36554570
cn2623$ bs list apps -C appid -C appname
+---------+----------------------------------+
| appid   | appname                          |
+---------+----------------------------------+
| 279279  | BWA Whole Genome Sequencing v1.0 |
| 544544  | TopHat Alignment                 |
| 625625  | SRA Import                       |
| 1825824 | Isaac Whole Genome Sequencing    |
| 408408  | Cufflinks Assembly & DE          |
+---------+----------------------------------+

Now, launch the SRA import app to import an (Illumina) run from SRA directly into BaseSpace

cn2623$ bs launch app -i 625625 -o "sra-id:SRR292678" TestProject --dry-run
would launch app SRA Import (625625)
with launch name: SRA Import : 
payload:
{"Properties": [{"items": ["Accepted"], "Type": "string[]", 
 "Name": "Input.basespace-labs-disclaimer"}, 
{"Content": "v1pre3/projects/31671652", "Type": "project", 
 "Name": "Input.project-id"}, 
{"Content": "SRR292678", "Type": "string", 
 "Name": "Input.sra-id"}], "Name": "SRA Import", "AutoStart": true, 
 "Status Summary": "AutoLaunch"}
cn2623$ bs launch app -i 625625 -o "sra-id:SRR292678" TestProject
SRA Import :  (36554570)

Please see the Manual for more detail.