portal_client on Biowulf

Portal_client is Python-based client for downloading data made available through portals powered by the GDC-based portal system. There are several portals running on the internet to support various research efforts. Notably, the Neuroscience Multi-omic Archive (NeMO, https://nemoarchive.org/) and the Human Microbiome Project Data Analysis and Coordination Center (hmpdacc.org) use the portal to enable data exploration and download. The client accepts a manifest file as an input. This file contains URLs to the files to be downloaded. Manifest files can be generated using the shopping cart functionality of the portal's query interface.

Documentation
Important Notes

Downloading data
[user@helix]$ module load portal_client
[user@helix]$ portal_client --manifest /path/to/my/manifest.tsv --destination /data/$USER/xxx

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --gres=lscratch:10 -c 2 --mem=8g 
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load portal_client

[user@cn3144 ~]$ cd /data/$USER/

[user@cn3144 ~]$ portal_client --manifest /path/to/my/manifest.tsv


Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. portal_client.sh). For example:

#!/bin/bash
set -e
module load portal_client
portal_client --manifest /path/to/my/manifest.tsv --destination /data/$USER/xxx

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] portal_client.sh