Biowulf High Performance Computing at the NIH
Data transfer and sharing using Globus

Globus is a service that makes it easy to move, sync, and share large amounts of data. Globus will manage file transfers, monitor performance, retry failures, recover from faults automatically when possible, and report the status of your data transfer. Globus uses GridFTP for more reliable and high-performance file transfer, and will queue file transfers to be performed asynchronously in the background.

Globus was developed and is maintained at the University of Chicago and is used extensively at supercomputer centers and major research facilities. [Globus website]

NIH scientists who wish to utilize this service to transfer data to/from their Helix/Biowulf disk space can use their NIH Login username and password to login.

No matter how you transfer data in and out of our systems, be aware that PII and PHI data cannot be stored or transferred into the NIH HPC systems.

NIH HPC (Helix/Biowulf) Endpoints

The endpoint nihhelix#helix has been shut down as of 30 Apr 2017. All users must use the nihhpc#globus endpoint. Any endpoints that were previously shared from nihhelix#helix must be re-shared from nihhpc#globus.

The Globus endpoint for transferring data to or from your Helix/Biowulf /home, /data or /scratch areas is nihhpc#globus. This endpoint is implemented using eight "Data Transfer Nodes" which can operate in parallel to provide 80 Gb/s of aggregate bandwidth.

You do not need to be logged on to Helix or Biowulf to start or monitor a transfer.

Logging into Globus with your NIH login

NIH researchers can use their NIH Login username and password to access Globus. Go to https://www.globus.org/ and click on Log In in the upper right corner of the page.

Type or scroll down to National Institues of Health in the Organization box, and then click Continue.

You will be taken to a familiar-looking page for NIH login.

Enter your NIH login username and password.

Installing the Globus client on your desktop

The Globus Connect client is available for Windows, Mac or Linux desktop systems. There are detailed instructions on the Globus website. See links below.

How to install and configure Globus Connect Personal on

Transferring data between your desktop and Biowulf

On your desktop system, you will need to have Globus Connect Personal running. Point your web browser to www.globus.org. Click on 'Log on', and enter your NIH username and password on the following NIH login page. After authenticating, you will be taken to the Globus File Manager page.

In the 'Collection' box, type 'NIH HPC Data Transfer'. You may need to authenticate: if so, you will be taken to the Globus authentication page as described above, and can authenticate with your NIH login username and password.

By default, you should see the files in your /home area on Helix/Biowulf appear. You can also point to your /data area or another shared area by entering the appropriate path in the Path box.

Click on 'Sync or Transfer files'.

Enter the other endpoint, in this case the endpoint name that you gave to your desktop system when you installed Globus. You should now see both endpoints listed in two panes of the Globus window.

To transfer files, select a file or directory on one endpoint, and click the blue 'Start' button.

The page will now say that the transfer request has been submitted, and give you a Task ID.
Each transfer will be monitored and statistics can be displayed at the bottom of the page. You will also receive an email when the transfer is complete.

From: Globus Online Notification <notify@globusonline.org>
To: XYZ <username@mail.nih.gov>
Subject: Task 995fa6e2-c885-11e2-983d-123139404f2e: SUCCEEDED
Date: Wed, 29 May 2013 17:32:02 +0000

=== Task Details ===
Task ID                 : 995fa6e2-c885-11e2-983d-123139404f2e
Task Type               : TRANSFER
Parent Task ID          : n/a
Status                  : SUCCEEDED
Request Time            : 2013-05-29 17:31:30Z
Deadline                : 2013-05-30 17:31:30Z
Completion Time         : 2013-05-29 17:31:36Z
Total Tasks             : 1
Tasks Successful        : 1
Tasks Expired           : 0
Tasks Canceled          : 0
Tasks Failed            : 0
Tasks Pending           : 0
Tasks Retrying          : 0
Command                 : scp -r nihhpc#globus:/data/user/blast/bench/1000_est  my_desktop#test:.
Label                   : n/a
Sync Level              : n/a
Data Encryption         : No
Checksum Verification   : Yes
Delete                  : No
Files                   : 1
Files Skipped           : 0
Directories             : 0
Bytes Transferred       : 158628444
Bytes Checksummed (Sync): 0
MBits/sec               : 211.505
Faults                  : 0

If you need to transfer data between two Globus Connect Personal endpoints (e.g. your desktop system and your laptop), you will need a Globus Plus license. Email staff@hpc.nih.gov to request one. Your Globus Plus license will be terminated when you leave NIH.

Once you have the license, you can transfer data between your own two Globus Personal endpoints, just as between any other Globus endpoints. Note that this only applies to a single Globus user, with a Globus Plus license, running Globus Connect Personal on two different systems.

Transferring data using the command line

Note: in the example below, the command-line globus transfer is started on Helix, the interactive data transfer node. However, the transfer to the NIH HPC endpoint ('NIH HPC Data Transfer') will go via the 8 HPC data transfer nodes.

Sample session:

[user@helix ~$ globus login
Please login to Globus here:
---------------------------
https://auth.globus.org/long/globus/URL/
---------------------------
When you point a web browser to the URL that is provided, you will need to authenticate against the NIH domain, with your usual NIH login username and password. You will then see a page like the following

When you click Allow, you will get a page with a long authorization code. Cut-and-paste that code back into your terminal window:

Enter the resulting Authorization Code here: aabbccddeeffgg1122334455

You have successfully logged in to the Globus CLI as username@globusid.org
You can always check your current identity with
  globus whoami
Logout of the Globus CLI with
  globus logout

[user@biowulf ~]$ globus whoami
user@globusid.org
or
user@nih.gov

Depending on how recently you signed up for Globus, the command 'globus whoami' may show you your Globus userid (user@globusid.org) or your NIH userid (user@nih.gov). Once you have logged in to Globus, you can set up transfers. You will need the ID of the endpoint, which you can find using the 'globus endpoint search' command. In the example below, the user obtains a lit of NIH endpoints, and then gets a listing of the user's files on one of those endpoints.

 [user@biowulf ~]$ globus endpoint search NIH

Owner                       | ID                                   | Display Name
--------------------------- | ------------------------------------ | ----------------------------------------
nihcitoir@globusid.org      | 4bc66d32-5057-11e6-8238-22000b97daec | NIH CIT OIR HPN
nihnhlbidir@globusid.org    | e1c214bc-63d5-11e6-833f-22000b97daec | NIH NHLBI DIR
nihhpc@globusid.org         | e2620047-6d04-11e5-ba46-22000b92c6ec | NIH HPC Data Transfer
nihnci@globusid.org         | e1c6b3bd-6d04-11e5-ba46-22000b92c6ec | nihnci#NIH-NCI-TRANSFER1

[...]
[user@biowulf ~]$ globus ls 'e2620047-6d04-11e5-ba46-22000b92c6ec'

Globus CLI Error: A Transfer API Error Occurred.
HTTP status: 400
request_id: CXOFvSFsb
code: ClientError.ActivationRequired
message: The endpoint 'e2620047-6d04-11e5-ba46-22000b92c6ec' is not activated
Go to https://www.globus.org/app/endpoints and activate endpoint. (10 day limit)
You may need to 'activate' the Globus endpoints you plan to use. Once an endpoint is activated, it will stay active for 10 days. If your transfer is not completed within 10 days, you'll get an email message requesting you to re-activate the endpoint, and once you've done that the transfer will resume. You can check the status of endpoints at https://www.globus.org/app/endpoints . Once your endpoints are activated, you can set up transfers.

[user@biowulf ~]$ globus transfer --recursive --no-verify-checksum \
	d8eb36b6-6d04-11e5-ba46-22000b92c6ec:/data1/5GB-in-small-files/ \
	e2620047-6d04-11e5-ba46-22000b92c6ec:/scratch/$USER/5GB-in-small-files/ 
Message: The transfer has been accepted and a task has been created and queued for execution 
Task ID: 6924b21e-f54b-11e6-ba69-22000b9a448b 

You can use the method above in your Biowulf batch scripts, for example, to transfer data at the end of a job. [Example]

Documentation about the beta version of the new Globus CLI

Globus Plus

For some kinds of data transfer or sharing, you need Globus Plus. The NIH Globus subscription includes Globus Plus for all users, but you need to email staff@hpc.nih.gov to request a Globus Plus invite.

When do you not need Globus Plus?

When do you need Globus Plus? Note 1: if your collaborator needs Globus Plus to download data, and is not at NIH, we cannot provide Globus Plus to that person.

Note 2: By default, files on a Globus Connect Personal endpoint (e.g. your laptop or desktop) may not be shareable. You will need to configure that via the instructions at these links: Linux, Mac, Windows.

Sharing data with collaborators

If your data is on the NIH HPC systems, you can easily share it with collaborators who are at NIH or elsewhere. All they will need is a (free) Globus account. The advantage of data sharing via Globus is that you do not need to transfer your data anywhere. This prevents data duplication, wastage of storage space, and saves time. You have full control over which files your collaborator can access, and whether they have read-only or read-write permissions.

Sharing data from NIH HPC systems (e.g. Helix or Biowulf):

Sharing data from a Globus Connect Personal endpoint (e.g. your desktop system)

Important Notes about Sharing:

Below are details on how to share an endpoint, or a subset of an endpoint, with another Globus user.

Navigate to the Globus File Manager' on globus.org and select the subdirectory you want to share.

Click on the 'Share' button in the right side pane.

Give a description for your new shared endpoint, then click 'Create' . Now you will want to share it with other Globus users. Click on 'Add Permissions - Share With'

This shows some information about the shared endpoint. You can choose to share with other Globus users by selecting their email address, Globus username, name, or share with a group or all users. You can search with a name, or an email address. By default, you will provide only read access to the directory. If you wish your colleague to transfer data into the directory, you can also provide write access by checking the 'write' button. You can also select 'Send Email' to send email to your colleague. Once the options have been selected, click 'Add Permission'.

You should then see the share and the people you have shared it with.

You can repeat this process for any number of collaborators. At any time, you can terminate access to the directory by clicking the 'X' next to the invitee in the screen above.

To see all endpoints you have shared, go to 'Endpoints' in the left bar, then 'Administered by You'.

It is highly recommended that you delete the endpoint share when your collaborator has completed downloading the data. You can do so by going to 'Endpoints' in the left bar, then 'Administered by You', select the endpoint, and click on 'Delete endpoint'.

What to tell your collaborators

If you set up a shared endpoint and want your collaborator to download the data, this is what you need to tell them.

First, the collaborator needs to get a Globus account. The instructions for setting up a Globus account are as described on our webpage above. This account is free. They may already have Globus access via their institution.

If the collaborator is downloading the data to his/her personal workstation, they need to install the Globus Connect client. Globus connect clients are available for Mac, Windows or Linux systems and are free.

If you clicked on the 'notify users via email' button when you added access for this user, they should have received a message that looks like this:

You can, of course, also send email to your collaborators yourself, telling them you've shared a folder with them. The collaborator should click on the link, which will require logging in with their institutional or Globus login username and password. They should then be able to see the files you shared with them.

They should click on the files they want to transfer, then 'Transfer or Sync to', enter their own endpoint name and desired path.

and click the 'Start' button near the bottom to start the transfer.

If the collaborator wants to write to a network share, he/she must map this location to a local drive. See the Configuration section of the Globus Connect Personal installation documentation.

Encryption and Security

Data can be encrypted during Globus file transfers. In some cases encryption cannot be supported by an endpoint, and Globus Online will signal an error.

For more information, see How does Globus Online ensure my data is secure?

In the Transfer Files window, click on 'More options' at the bottom of the 2 panes. Check the 'encrypt transfer' checkbox in the options.

Alternatively, you can encrypt the files before transfer using any method on your local system, then transfer them using Globus, then unencrypt on the other end.

Note that encryption and verification will slow down the data transfer.

50+ transfers were performed with each set of parameters over the course of several days. Each transfer was of a single file, approx 1 GB.

In our tests, compared to setting no parameters:

Setting the 'Verify file integrity after transfer' (the default setting) added ~25% to the transfer time.

Setting the 'Encrypt transfer' parameter added ~13% to the transfer time.

Setting both Verify + Encrypt added 40% to the transfer time.

See also:

How does Globus Online ensure my data is secure?

Globus Online Security Review by Von Welch, Feb 2012.