Biowulf High Performance Computing at the NIH

Additional information for NCI-CCR users of the NIH Biowulf Cluster

NCI-CCR has funded 216 nodes (6048 physical cores, 12,096 cpus with hyperthreading, 64 GPUs) in the Biowulf cluster, and CCR users have priority access to these nodes. This priority status will last until October 31, 2019 (FY2015 funded nodes) and February 20, 2021 (FY2016 funded nodes).

How to get Access to the CCR nodes

If you do not already have one, get an HPC account. Fill out the account request form at
https://hpc.nih.gov/nih/accounts/account_request.php.
This form is only accessible from the NIH network or VPN, and requires logging in with your NIH username and password.

Once the PI has approved and the account is set up, all users under an NCI-CCR PI will automatically get priority access to the CCR buyin nodes.

Hardware
The hardware characteristics of each compute node are as follows:
  • 2 x Xeon Intel E5-2695v3, 14-cores @ 2.30GHz, hyperthreading enabled, or
  • 2 x Xeon Intel E5-2680v4, 14-cores @ 2.40GHz, hyperthreading enabled
  • 2 x Nvidia K80 GPUs (24 nodes)
  • 256 GB memory
  • 400 GB or 800 GB SSD (solid-state) disk
  • 56 Gb/s Infiniband
  • Hyperthreading

    Hyperthreading is a hardware feature of the Xeon processor that allows each physical core to run two simultaneous threads of execution thereby appearing to double the number of real cores. Thus the 28-core CCR nodes will appear to have 56 cores. In many cases this will increase the performance of applications that can multi-thread or otherwise take advantage of multiple cores. However, before running 56 threads of execution on a single node, the Biowulf staff recommends that you benchmark your application to determine whether it can take advantage of hyperthreading or not. (Or even whether it scales to 28 cores!).

    Submitting jobs to the batch system

    Jobs are submitted to the CCR nodes by specifying the "ccr" or "ccrgpu" partitions. In the simplest case,

    sbatch --partition=ccr your_batch_script
    

    Submitting a job requiring 128 GB of local scratch,

    sbatch --partition=ccr --gres=lscratch:128 your_batch_script
    

    Submitting a job allocating 4 GPUs,

    sbatch --partition=ccrgpu --gres=gpu:k80:4 your_batch_script
    

    To ensure allocation of a node with a 56 Gb/s Infiniband connection,

    sbatch --partition=ccr --constraint=ibfdr your_batch_script
    

    Allocating an interactive node:

    sinteractive --constraint=ccr
    

    To submit a swarm of jobs,

    swarm -f command_file --partition ccr
    

    Note that jobs submitted to the CCR partition will not run on non-CCR nodes. If there are no CCR nodes available, the job will remain queued until CCR nodes become free (note: you may also specify "--partition=ccr,norm").

    Core Limits

    The current per-user core limit on the CCR queue can be seen via the 'batchlim' command.

    biowulf% batchlim
    Partition        MaxCPUsPerUser     DefWalltime     MaxWalltime
    ---------------------------------------------------------------
    ccr                      3072         04:00:00     10-00:00:00 
    ccrgpu                    448         04:00:00     10-00:00:00 (32 GPUs per user)
    
    

    Node Availablity

    While approved CCR users have priority access to the CCR nodes, they will be accessible by other Biowulf users by virtue of the existence of a "short" queue. Nodes not in use by CCR users may be allocated for short queue jobs for up to 4 hours. That is, no CCR job will be queued for more than 4 hours waiting for nodes allocated to short queue jobs.

    To see how many nodes of each type are available use the freen command; there is now a separate section which reports the number of available CCR nodes:

    $ freen
                                               ........Per-Node Resources........  
    Partition    FreeNds       FreeCPUs        Cores CPUs   Mem   Disk    Features
    --------------------------------------------------------------------------------
    ccr         4/71         1678/3976         28    56    248g   800g   cpu56,core28,g256,ssd800,x2680,ibfdr,ccr
    ccr         8/122        838/6832          28    56    246g   400g   cpu56,core28,g256,ssd400,x2695,ibfdr,ccr
    ccrgpu      8/16         448/896           28    56    246g   400g   cpu56,core28,g256,ssd400,x2695,ibfdr,gpuk80,ccr
    
    


    CCR Partitions - Last 24 hrs

    CCR Partitions - Last month

    CCR Partitions - Last year

    Please send questions and comments to staff@hpc.nih.gov