Portal_client is Python-based client for downloading data made available through portals powered by the GDC-based portal system. There are several portals running on the internet to support various research efforts. Notably, the Neuroscience Multi-omic Archive (NeMO, https://nemoarchive.org/) and the Human Microbiome Project Data Analysis and Coordination Center (hmpdacc.org) use the portal to enable data exploration and download. The client accepts a manifest file as an input. This file contains URLs to the files to be downloaded. Manifest files can be generated using the shopping cart functionality of the portal's query interface.
[user@helix]$ module load portal_client [user@helix]$ portal_client --manifest /path/to/my/manifest.tsv --destination /data/$USER/xxx
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --gres=lscratch:10 -c 2 --mem=8g salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load portal_client [user@cn3144 ~]$ cd /data/$USER/ [user@cn3144 ~]$ portal_client --manifest /path/to/my/manifest.tsv
Create a batch input file (e.g. portal_client.sh). For example:
#!/bin/bash set -e module load portal_client portal_client --manifest /path/to/my/manifest.tsv --destination /data/$USER/xxx
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] portal_client.sh