Sharing Data with Collaborators

Many users wish to share their raw or processed data on Helix/Biowulf with collaborators in and outside NIH. In some cases a group may want to work in a shared disk area rather than having individual directories. Available options for sharing of data are listed below.

Important Note: It may be tempting to share data by changing the permissions on your /home or /data directory so that other users can access the files within. We prohibit allowing world access to your directory. Limited access can be extended to a few others using Access Control Lists.

NIH collaborators with Helix or Biowulf accounts

World-accessible /scratch

For a one-time or very occasional transfer of non-private files to a collaborator who has a Helix/Biowulf account, the simplest way is to copy the files to a new directory /scratch/MyUniqDir, change the permissions so that your collaborator can access them, and then tell your collaborator where they are. For example:

helix% mkdir /scratch/MyUniqDir
helix% chmod a+rx /scratch/MyUniqDir
helix% cp myfile.txt /scratch/MyUniqDir
helix% chmod a+r /scratch/MyUniqDir/myfile.txt

Note that this allows everyone on Helix/Biowulf to potentially access the file, so it is only suitable for files which are non-private, such as a sequence database composed of publicly available sequences. All files in /scratch get deleted 10 days after last access.

Your collaborator should do the copy on Helix (the designated interactive data transfer node), rather than on the Biowulf login node. /scratch is not available on the Biowulf compute nodes. After your collaborator has copied the file to their own area, you should delete the directory i.e.
% rm -rf /scratch/MyUniqDir
Don't use your personal /scratch/$USER for this purpose, as this will allow any user on the system to read and delete files in /scratch/$USER.

Allowing access to personal /data with ACLs

Access Control Lists offer a way of allowing a single user access to a portion of your personal /data directory. In this example, the subdirectory /data/user/for_my_colleague is opened for browsing to my_colleague:

helix% setfacl -m u:colleague:--x /data/user
helix% setfacl -m u:colleague:r-x /data/user/for_mycolleague

More about groups and permissions

Shared /data directory

If you wish to share data on a regular basis with other users on Helix/Biowulf, or have many people access the same data without copying it back and forth, the best way is to set up a group with shared disk area. The 'group owner' and 'group members' have specific responsibilities; see the groups & shared directories page for more information. You can apply for the shared area and the group by filling out the form at https://hpcnihapps.cit.nih.gov/auth/dashboard/shared_data_request.php.

Globus

Globus is a service that makes it easy to move, sync, and share large amounts of data. Globus will manage file transfers, monitor performance, retry failures, recover from faults automatically when possible, and report the status of your data transfer. Globus uses GridFTP for more reliable and high-performance file transfer, and will queue file transfers to be performed asynchronously in the background. Globus can also be used for sharing data with collaborators inside and outside the NIH. NIH researchers can use thier NIH Login username and password to access Globus; collaborators outside the NIH will need a free Globus account. Logging into Globus, data transfer and sharing.

NIH or outside collaborators without Helix/Biowulf accounts

Globus

Globus can be used for sharing data with collaborators inside and outside the NIH. NIH researchers can use their NIH Login username and password to access Globus; collaborators outside the NIH will need a free Globus account, and will need to install Globus Connect Personal (free, available for Windows, Mac, Linux). Logging into Globus, data transfer and sharing.

NIH Box and OneDrive

NIH/CIT provides Box and OneDrive collaboration tools for sharing data with collaborators. Which one you can use depends on the size of data to be shared, the size of individual files, and whether the collaborators are at NIH or outside. See https://hpc.nih.gov/docs/box_onedrive.html for more information.

Datashare

Helix/Biowulf users can set up special directories which are readable via the web, but are not browseable. This space is intended for data sharing only, and personal web pages are not allowed. See more information about datashare directories here.

Any collaborators

Globus

Globus can be used for sharing data with collaborators inside and outside the NIH. NIH researchers can use thier NIH Login username and password to access Globus; collaborators outside the NIH will need a free Globus account. Logging into Globus, data transfer and sharing. Sharing via Globus will allow you to precisely specify who can access the files, and you can turn on and off this access at any time.

Access Control Lists (ACLs)

Access Control Lists (ACLs) is an extension of the traditional UNIX permission concept, and allows more complex and sophisticated access to files under Linux. Specifically, ACLs make it possible to grant indidividual users or groups access to single files or directories. Moreover, they afford selective control over read, write, and execute permissions. This gives much more sophisticated control for sharing data between users on our systems.

More information about ACLs on NIH HPC Systems

Custom Situation

Contact staff@hpc.nih.gov if your situation is not addressed by any of the above sections, and we'll try and come up with some options for you.