Biowulf High Performance Computing at the NIH
Data transfer and sharing using Globus

Globus is a service that makes it easy to move, sync, and share large amounts of data. Globus will manage file transfers, monitor performance, retry failures, recover from faults automatically when possible, and report the status of your data transfer. Globus uses GridFTP for more reliable and high-performance file transfer, and will queue file transfers to be performed asynchronously in the background.

Globus was developed and is maintained at the University of Chicago and is used extensively at supercomputer centers and major research facilities. [Globus website]

NIH scientists who wish to utilize this service to transfer data to/from their Helix/Biowulf disk space can use their NIH Login username and password to login.

No matter how you transfer data in and out of our systems, be aware that PII and PHI data cannot be stored or transferred into the NIH HPC systems.

NIH HPC (Helix/Biowulf) Endpoints

The NIH HPC (Biowulf) endpoint is called 'NIH HPC Data Transfer'.

The Globus endpoint for transferring data to or from your Helix/Biowulf /home, /data or /scratch areas is NIH HPC Data Transfer. This endpoint is implemented using eight "Data Transfer Nodes" which can operate in parallel to provide 80 Gb/s of aggregate bandwidth.

You do not need to be logged on to Helix or Biowulf to start or monitor a transfer.

Note for Summer Interns Using the VDI

NIH summer interns using the Virtual Desktop Interface can utilize Globus for some, but not all data transfers.

Logging into Globus with your NIH login

NIH researchers can use their NIH Login username and password to access Globus. Go to https://www.globus.org/ and click on Log In in the upper right corner of the page.

Type or scroll down to National Institues of Health in the Organization box, and then click Continue.

You will be taken to a familiar-looking page for NIH login.

Use your PIV card as usual, or enter your NIH login username and password.

Installing the Globus client on your desktop

The Globus Connect client is available for Windows, Mac or Linux desktop systems. There are detailed instructions on the Globus website. See links below.

It is best to be off the VPN when installing the Globus client. How to install and configure Globus Connect Personal on

Troubleshooting: During the login procedure, if your browser says 'Login successful', while the Globus client itself says 'Browser login did not complete. Error: ConnectionError on request', please log off the VPN and try the install again.

Windows Notes:
  • By default, you will need adminstrative access to install Globus Connect Personal on Windows. This is because it will attempt to install into C:\\Program Files (x86)\Globus Connect Personal. To install as a regular, non-administrative account, change the installation directory to which you have write access, for example Desktop --> Globus Connect Personal.
  • A windows defender window may pop up during installation (Windows protected your PC). Selecting "More info" should a "Run anyway" button appear that will allow you to proceed.

Note:During installation, you will be offered the option of a 'High Assurance' endpoint. Do not select this option. NIH does not have a High Assurance Globus subscription.

Transferring data between your desktop and Biowulf

On your desktop system, you will need to have Globus Connect Personal running. Point your web browser to www.globus.org. Click on 'Log on', and enter your NIH username and password on the following NIH login page. After authenticating, you will be taken to the Globus File Manager page, or can click 'File Manager' in the left bar.

  1. In the 'Collection' box, type 'NIH HPC Data Transfer'. You may need to authenticate: if so, you will be taken to the Globus authentication page as described above, and can authenticate with your NIH login username and password. By default, you should see the files in your /home area on Helix/Biowulf appear. You can also point to your /data area or another shared area by entering, for example, '/data/myusername' in the Path box.
  2. Click on 'Sync or Transfer files'.
  3. Enter the other endpoint, in this case the endpoint name that you gave to your desktop system when you installed Globus. You should now see both endpoints listed in two panes of the Globus window.
  4. To transfer files, select a file or directory on one endpoint, and click the blue 'Start' button.
  5. The page will now say that the transfer request submitted successfully.
  6. Click on 'View details' to display task detail information. Statistics are displayed at this page.
  7. You will also receive an email when the transfer is complete. s

Transferring data between two Globus Personal endpoints

If you need to transfer data between two Globus Connect Personal endpoints (e.g. your desktop system and your laptop), you will need a Globus Plus license. Email staff@hpc.nih.gov to request one. Your Globus Plus license will be terminated when you leave NIH.

Once you have the license, you can transfer data between your own two Globus Personal endpoints, just as between any other Globus endpoints. Note that this only applies to a single Globus user, with a Globus Plus license, running Globus Connect Personal on two different systems.

Transferring data using the command line

Note: in the example below, the command-line globus transfer is started on Helix, the interactive data transfer node. However, the transfer to the NIH HPC endpoint ('NIH HPC Data Transfer') will go via the 8 HPC data transfer nodes.

Sample session:

[user@helix ~$ globus login
Please login to Globus here:
---------------------------
https://auth.globus.org/long/globus/URL/
---------------------------
When you point a web browser to the URL that is provided, you will need to authenticate against the NIH domain, with your usual NIH login username and password. You will then see a page like the following

When you click Allow, you will get a page with a long authorization code. Cut-and-paste that code back into your terminal window:

Enter the resulting Authorization Code here: aabbccddeeffgg1122334455

You have successfully logged in to the Globus CLI as username@globusid.org
You can always check your current identity with
  globus whoami
Logout of the Globus CLI with
  globus logout

[user@biowulf ~]$ globus whoami
user@globusid.org
or
user@nih.gov

Depending on how recently you signed up for Globus, the command 'globus whoami' may show you your Globus userid (user@globusid.org) or your NIH userid (user@nih.gov). Once you have logged in to Globus, you can set up transfers. You will need the ID of the endpoint, which you can find using the 'globus endpoint search' command. In the example below, the user obtains a list of NIH endpoints, and then gets a listing of the user's files on one of those endpoints.

 [user@biowulf ~]$ globus endpoint search NIH

Owner                       | ID                                   | Display Name
--------------------------- | ------------------------------------ | ----------------------------------------
nihcitoir@globusid.org      | 4bc66d32-5057-11e6-8238-22000b97daec | NIH CIT OIR HPN
nihnhlbidir@globusid.org    | e1c214bc-63d5-11e6-833f-22000b97daec | NIH NHLBI DIR
nihhpc@globusid.org         | e2620047-6d04-11e5-ba46-22000b92c6ec | NIH HPC Data Transfer
nihnci@globusid.org         | e1c6b3bd-6d04-11e5-ba46-22000b92c6ec | nihnci#NIH-NCI-TRANSFER1

[...]
[user@biowulf ~]$ globus ls 'e2620047-6d04-11e5-ba46-22000b92c6ec'

Globus CLI Error: A Transfer API Error Occurred.
HTTP status: 400
request_id: CXOFvSFsb
code: ClientError.ActivationRequired
message: The endpoint 'e2620047-6d04-11e5-ba46-22000b92c6ec' is not activated
Go to https://www.globus.org/app/endpoints and activate endpoint. (10 day limit)
You may need to 'activate' the Globus endpoints you plan to use. Once an endpoint is activated, it will stay active for 10 days. If your transfer is not completed within 10 days, you'll get an email message requesting you to re-activate the endpoint, and once you've done that the transfer will resume. You can check the status of endpoints at https://www.globus.org/app/endpoints . Once your endpoints are activated, you can set up transfers.

[user@biowulf ~]$ globus transfer --recursive --no-verify-checksum \
	d8eb36b6-6d04-11e5-ba46-22000b92c6ec:/data1/5GB-in-small-files/ \
	e2620047-6d04-11e5-ba46-22000b92c6ec:/scratch/$USER/5GB-in-small-files/ 
Message: The transfer has been accepted and a task has been created and queued for execution 
Task ID: 6924b21e-f54b-11e6-ba69-22000b9a448b 

You can use the method above in your Biowulf batch scripts, for example, to transfer data at the end of a job. [Example]

Documentation about the Globus CLI

Globus Plus

For some kinds of data transfer or sharing, you need Globus Plus. The NIH Globus subscription includes Globus Plus for all users, but you need to email staff@hpc.nih.gov to request a Globus Plus invite.

When do you not need Globus Plus?

When do you need Globus Plus? Note 1: if your collaborator needs Globus Plus to download data, and is not at NIH, we cannot provide Globus Plus to that person.

Note 2: By default, files on a Globus Connect Personal endpoint (e.g. your laptop or desktop) may not be shareable. You will need to configure that via the instructions at these links: Linux, Mac, Windows.

Encryption and Security

Data can be encrypted during Globus file transfers. In some cases encryption cannot be supported by an endpoint, and Globus Online will signal an error.

For more information, see How does Globus Online ensure my data is secure?

In the Transfer Files window, click on 'More options' at the bottom of the 2 panes. Check the 'encrypt transfer' checkbox in the options.

Alternatively, you can encrypt the files before transfer using any method on your local system, then transfer them using Globus, then unencrypt on the other end.

Note that encryption and verification will slow down the data transfer.

50+ transfers were performed with each set of parameters over the course of several days. Each transfer was of a single file, approx 1 GB.

In our tests, compared to setting no parameters:

Setting the 'Verify file integrity after transfer' (the default setting) added ~25% to the transfer time.

Setting the 'Encrypt transfer' parameter added ~13% to the transfer time.

Setting both Verify + Encrypt added 40% to the transfer time.

Recurring or Scheduled Transfers

See the Globus Recurring transfers page for information on setting up recurring transfers.

Sharing Data

See the Globus Sharing page for information on how to share data with collaborators.

Cloud Connectors

See the Cloud connectors page for information on how to use Globus to AWS S3 and Google Cloud Storage.

Troubleshooting
Can't see your network drive in Globus?
By default, Globus Connect Personal on your desktop system can only see your home directory. Any other area, such as a network drive, must be enabled in Globus. To configure your Globus Connect Personal instance to access other paths, see the instructions for
Windows
Mac
Linux

See also

How does Globus Online ensure my data is secure?

Globus Online Security Review by Von Welch, Feb 2012.