Globus on NIH HPC (Biowulf)
Globus is a service that makes it easy to move, sync, and share large amounts of data. Globus will manage file transfers, monitor performance, retry failures, recover from faults automatically when possible, and report the status of your data transfer. Globus uses GridFTP for more reliable and high-performance file transfer, and will queue file transfers to be performed asynchronously in the background.

Globus was developed and is maintained at the University of Chicago and is used extensively at supercomputer centers and major research facilities. [Globus website]

No matter how you transfer data in and out of our systems, be aware that PII and PHI data cannot be stored or transferred into the NIH HPC systems.

See the links in the Quick Links menu at left for details.

Setting up a Managed Globus Endpoint at NIH
Globus is a service that makes it easy to move, sync, and share large amounts of data. Globus will manage file transfers, monitor performance, retry failures, recover from faults automatically when possible, and report the status of your data transfer. Globus uses GridFTP for more reliable and high-performance file transfer, and will queue file transfers to be performed asynchronously in the background.

Globus was developed and is maintained at the University of Chicago and is used extensively at supercomputer centers and major research facilities. [Globus website]

ICs at NIH that want to set up a managed endpoint will need to purchase their own Globus Provider Plan. NIH sites dealing with PII/PHI data may want to consider setting up a High Assurance endpoint.

Important Notes

Globus v5 Endpoint

Here are links to the Globus Quickstart Guide and detailed installation guide.

  1. Select a name for your endpoint. The name should start with 'NIH' so NIH users can easily find NIH endpoints. (e.g. 'NIH NIA XyzLab').

  2. Install the software following the instructions in the two links above.

  3. Create credentials and register the server as described in the instructions.

  4. Run 'globus-connect-server endpoint setup' to configure the server. Sample command:
    globus-connect-server endpoint setup "NIHYourICorLab Globus v5" --organization "NIHYourICorLab" --client-id "#######" --owner "yourGlobusAdminID@globusid.org"
    
    and then 'globus-connect-server node setup' to start it up. Sample command:
    globus-connect-server node setup --client-id "#######" --ip-address 123.231.xx.yy
    

  5. Log in to the endpoint. The command below will give you a login URL and then, once you log in, an authorization code to complete the login.
    globus-connect-server login localhost
    
    Important: if you used a Globus username like 'yourGlobusAdminID' (as in the commands in step 4 above), do not authenticate with NIH login. Instead, on the Globus login page, click on 'Use Globus ID to sign in'' at the bottom of the page.

  6. Create a json file to set the paths that you wish to be accessible via Globus. You will want to restrict accessible paths to the filesystems and directories that are used by users to transfer and share data. Create a json file (aaa.json in the command below) with the restricted paths. Sample json file:
    {
      "DATA_TYPE": "path_restrictions#1.0.0",
      "read_write": [
        "/path1",
        "/path2"
      ]
    }
    

    Create the storage gateway. Sample command:

    globus-connect-server storage-gateway create posix "NIH YourICorLab Posix" --domain nih.gov --restrict-paths file:aaa.json
    
    This command will give you a 'Storage Gateway ID' which will be needed in the next step.

  7. If you are enabling data sharing on this endpoint, create a json file to set the paths from which users can share data. This file could be identical to the aaa.json file above, or different. Create the collection. Sample command:
    globus-connect-server collection create --allow-guest-collections --sharing-restrict-paths file:bbb.json Storage_Gateway_ID  / "Collection Name"
    
    (The last parameter is the name that you came up with in Step 1.)

  8. To configure your server to use NIH Login for authentication, edit the /etc/globus-connect-server.conf file and set
    IdentityMethod = CILogon
    CILogonIdentityProvider = National Institutes of Health
    

Globus v4 Endpoint

To set up a Globus v4 managed endpoint, you need to follow these steps.

  1. Select a name for your endpoint. The name should start with 'nih' so NIH users can easily find NIH endpoints. (e.g. 'nihdctg').
  2. Create a Unix group and user on your local system with this name.
  3. su to this newly created account and create ssh keys with 'ssh-keygen -t rsa'.
  4. Create a Globus account at globusid.org for this username ('nihdctg', in this example) and upload your private key. (Instructions here)
  5. Install Globus Connect Server. (Instructions here). Note: Open TCP Ports. Start up Globus and confirm that you can make some test transfers to your endpoint.
  6. Make your Globus endpoint publicly visible by editing the /etc/globus-connect-server.conf file and setting
    [Endpoint]
    Name = myendpointname
    Public = True
    
  7. you will need to make the endpoint managed. You can use the Web GUI and follow the instructions here: https://docs.globus.org/faq/subscriptions/#how_do_i_convert_an_existing_endpoint_to_a_managed_endpoint
    Or follow the CLI instructions here: https://docs.globus.org/faq/subscriptions/#how_do_i_convert_an_existing_endpoint_into_a_managed_endpoint

To configure your server to use NIH Login for authentication, edit the /etc/globus-connect-server.conf file and set

IdentityMethod = CILogon
CILogonIdentityProvider = National Institutes of Health

Troubleshooting/Documentation

Troubleshooting guide for Globus v5.4

Add an ssh key to your globus account

Installing Globus Connect Server.

Configuring an endpoint to use CILogin identity provider