Globus on NIH HPC (Biowulf)
Globus is a service that makes it easy to move, sync, and share large amounts of data. Globus will manage file transfers, monitor performance, retry failures, recover from faults automatically when possible, and report the status of your data transfer. Globus uses GridFTP for more reliable and high-performance file transfer, and will queue file transfers to be performed asynchronously in the background.

Globus was developed and is maintained at the University of Chicago and is used extensively at supercomputer centers and major research facilities. [Globus website]

No matter how you transfer data in and out of our systems, be aware that PII and PHI data cannot be stored or transferred into the NIH HPC systems.

See the links in the Quick Links menu at left for details.

Recurring and Scheduled Transfers

Using either the web browser or command line, Globus will allow you to

Status of such scheduled jobs can be monitored and managed as well, and, unlike a cron job, your recurring transfers don't depend on the availability of your system.

To set up a recurring or scheduled transfer using the web browser.

Go to the Globus file manager and set up your transfer in the usual way. Under 'Transfer and Timer options', you can set up a time for the transfer to run, and also the repeat schedule.

Globus Timer CLI documentation

To set up a scheduled or recurring transfer via the command line, you will need the Globus UUIDs for the source and destination endpoint, which can be found via the Globus web interface or the Globus CLI.

Finding UUIDs using the Globus CLI:

# Find the UUIDs of the source and destination endpoint
biowulf% globus login
Please paste the following URL in a browser:
https://auth.globus.org/etc....

# once you go to the webpage and authenticate with your NIH login, you will see a page requesting that you allow Gloobus CLI to manage transfers etc. Click 'Allow''. 
You will then be provided with an authorization code which should be pasted into your terminal session.
Please Paste your Auth Code Below: ...pasted code...

# search for the UUID for the 'NIH HPC Data Transfer' endpoint
biowulf% globus endpoint search 'NIH HPC Data Transfer'
ID                                   | Owner                                           | Display Name
------------------------------------ | ------------------------------------------------| --------------------------
e2620047-6d04-11e5-ba46-22000b92c6ec | nihhpc@globusid.org                             | NIH HPC Data Transfer
[....]
# 

Finding UUIDs using the Globus web interface:
Go to https://app.globus.org/endpoints and search for the endpoint name'.

Click on the 'right arrow' at the end of your desired endpoint, which will pull up the endpoint details. The UUID is listed on that page.

Scheduling a Transfer

First you need to authenticate your globus-timer session. This is exactly like the authentication for the Globus CLI.

biowulf% globus-timer session login
Please paste the following URL in a browser:
https://auth.globus.org/etc....

# once you go to the webpage and authenticate with your NIH login, you will see a page requesting that you allow Globus CLI to manage transfers etc. Click 'Allow''. 
You will then be provided with an authorization code which should be pasted into your terminal session.
Please Paste your Auth Code Below: ...pasted code...
Sample session to set up the transfer of a directory once a day
biowulf% globus-timer job transfer --name "globus-timer-test" \
> --start '2021-02-09T14:00:00' \
> --interval '1d' \
> --source-endpoint e2620047-6d04-11e5-ba46-22000b92c6ec \
> --dest-endpoint fb1b8048-f84f-11ea-892a-0a5521ff3f4b \
> --item /data/$USER/dir1 /Users/$USER/Desktop/dir1  true
Name:            globus-timer-test
Job ID:          bef49456-8678-4326-b1d9-c9f2509a9988
Status:          new
Start:           2021-02-09T19:00:00+00:00
Interval:        1 day, 0:00:00
Next Run At:     2021-02-10T19:00:00+00:00


Parameters in the command above:
--name name of the job, to help identify it
--start 'YYYY-MM-DDTHH:MM:SS' start time for the job. Alternate syntax is available: see the Globus Timer CLI docs
--interval 'xxx' how often the job should run. See the Globus Timer CLI docs for syntax.
--source-endpoint xxx UUID of the source endpoint
--dest-endpoint xxx UUID of the destination endpoint
--item sourcepath destpath recursive source and destination paths for the file or directory. The last parameter defines whether the transfer should be recursive (e.g. for a directory). In the example above, a directory tree is being transferred, so the last parameter is set to true.

Monitoring and stopping scheduled transfers

Sample session to check scheduled jobs:

biowulf% globus-timer job list
Name              | Job ID                               | Status | Last Result
------------------|--------------------------------------|--------|-------------
globus-timer-test | bef49456-8678-4326-b1d9-c9f2509a9988 | loaded | RUN COMPLETE

biowulf% globus-timer job status 61cdd6d9-abd3-45a0-8ea2-7e961a741ca2
Name:            globus-timer-test
Job ID:          61cdd6d9-abd3-45a0-8ea2-7e961a741ca2
Status:          loaded
Start:           2021-02-09T20:30:00+00:00
Interval:        1 day, 0:00:00
Next Run At:     2021-02-10T20:30:00+00:00
Last Run Result: RUN COMPLETE

Terminate a scheduled transfer

biowulf% globus-timer job delete f150615c-26d7-450c-975e-57a5817d817b
Name:     globus-timer-test
Job ID:   f150615c-26d7-450c-975e-57a5817d817b
Status:   deleted
Start:    2021-02-09T17:30:00
Interval: 1 day, 0:00:00