Biowulf High Performance Computing at the NIH
Storage on Biowulf & Helix

There are several options for disk storage on the NIH HPC; please review this section carefully to decide where to place your data. Contact the NIH HPC systems staff if you have any questions.

Except where noted, there are no quotas, time limits or other restrictions placed on the use of space on the NIH HPC, but please use the space responsibly; even hundreds of terabytes won't last forever if files are never deleted. Disk space on the NIH HPC should never be used as archival storage.

Users who require more than the default disk storage quota should fill out the online storage request form.

NOTE: Historical traces of disk usage and file counts are available through the User Dashboard.

Summary of file storage options
  Location Creation Backups Space Available from
/home network (NFS) with Helix account yes 16 GB default quota B,C,H
/lscratch (nodes) local created by user job no ~850 GB shared C
/scratch network (NFS) created by user no 100 TB shared B,H,C
/data network (GPFS/NFS) with Biowulf account no 100 GB default quota B,C,H
H = helix, B = biowulf login node, C = biowulf compute nodes

Each user has a home directory called /home/username which is accessible from every HPC system. The /home area has a quota of 16 GB which cannot be increased.
/lscratch (nodes)

Each Biowulf node has a directly attached disk containing a /lscratch filesystem. Note that this space is not backed up, and thus, users should use it only as temporary space while running a job. Once the job exits, you will no longer have access to /lscratch on the node.

To use /lscratch, see Using Local Disk in the Biowulf User Guide.

Please use /lscratch or /scratch instead of /tmp for storage of temporary files.

There is a shared /scratch area that is accessible from Helix, the Biowulf login node, and the Biowulf computational nodes. /scratch is a large, low performance area meant for the storage of temporary files.

  • Files in /scratch are automatically deleted 10 days after last access.

  • Each user can store up to a maximum of 10 TB in /scratch. However, 10 TB of space is not guaranteed to be available at any particular time.

  • If the /scratch area is more than 80% full, the HPC staff will delete files as needed, even if they are less than 10 days old.

  • Use /data or /lscratch (not /scratch) when data is to be accessed from large numbers of compute nodes or large swarms.

  • The central /scratch area should NEVER be used as a temporary directory for applications -- use /lscratch instead.

  • /data

    These are RAID-6 filesystems mounted over NFS or GPFS from one of the following, all of which are configured for high availability: eight NetApp FAS8040 controllers, a DataDirect Networks SFA10K storage system with eight fileservers and two DataDirect Networks SFA12K storage systems with eight fileservers each. This storage offers high performance NFS/GPFS access, and is exported to Biowulf over a dedicated high-speed network. /data is accessible from all computational nodes as well as Biowulf and Helix, and will be the filesystem of choice for most users to store their large datasets. Biowulf users are assigned an initial quota of 100 GB on /data; please fill out the online form if you need to increase your quota.

    Note: your /data directory is actually physically located on filesystems named /spin1, /gs2, /gs3, /gs4, /gs5, /gs6, /gs7, /gs8, /gs9, /gs10, or /gs11. The /data directory consists of links to one of those filesystems. ALWAYS refer to your data directory through the /data links as opposed to the physical location because the physical location is subject to change based on administrator needs. In other words, use /data/username rather than (for example) /gs4/users/username in your scripts.

    Snapshots are available that allow users to recover files that have been inadvertently deleted. For more information on backups and snapshots, please refer to the File Backups webpage.

    If your data directory is located on the /spin1 storage area, it is possible that the size of snapshots in your data directory may cause a user to have less available storage space than you expect.

    Each data directory on /spin1 has its own snapshot space that is separate from a user's regular file or data space. However, if you delete a lot of data within a short period of time (a week or less), it is possible that the snapshot space for that data directory can be exceeded. When this occurs, regular file space (allocated quota space) will be used as additional snapshot space. Unfortunately there is no way for you to either determine that this is the situation, or to actually delete snapshots. If you delete a large amount of data and do not see the expected increase in available space, you should contact the HPC staff for assistance.

    Sharing data

    Information on methods for sharing data with collaborators both inside and outside NIH can be found on our sharing data webpage,

    Checking your disk storage usage

    Use the checkquota command to determine how much disk space you are using:

    $ checkquota
    Mount                   Used      Quota  Percent    Files    Limit 
    /data:               70.4 GB   100.0 GB   70.41%   307411  6225917
    /data(SharedDir):    11.4 GB    20.0 GB   56.95%     6273  1245180
    /home:                2.0 GB    16.0 GB   12.55%    11125      n/a
    mailbox:             74.7 MB     1.0 GB    7.29%         
    Best practices
    Submitting a swarm without knowing how much data it will generate Run a single job, sum up the output and tmp files, and figure out if you have enough space before submitting the swarm
    Directory with 1 million files Directories with < 5,000 files
    100 jobs all reading the same 50 GB file over and over from /data/$USER/ Use /lscratch instead, copy the file there, and have each job access the file on local disk
    100 jobs all writing and deleting large numbers of small temporary files Use /lscratch instead, have all tmp files written to local disk
    Each collaborator having a copy of data on Biowulf Ask for a shared area and keep shared files there to minimize duplication
    Use Biowulf storage for archiving Move unused or old data back to you local system