Additional information for LCP users of the NIH Biowulf Cluster

NIDDK has funded a number of nodes in the Biowulf cluster, and the Lab of Chemical Physics has priority access to these nodes. This priority status will last until January 7, 2016. There are a total of 206 nodes (3296 cores).

Jan 8 2016! These nodes have been transitioned to the general pool and are now the ibqdr partition.

The hardware characteristics of each compute node are as follows:

  • 2 x Xeon Intel E5-2670, 8-cores @ 2.60GHz, hyperthreading enabled
  • either 32 or 64 GB memory (103 nodes of each size)
  • 420 GB disk
  • 1 Gb/s ethernet
  • QDR Infiniband (32 Gb/s)
  • Hyperthreading

    Hyperthreading is a hardware feature of the Xeon processor that allows each physical core to run two simultaneous threads of execution thereby appearing to double the number of real cores. Thus the 16-core LCP nodes will appear to have 32 cores. In many cases this will increase the performance of applications that can multi-thread or otherwise take advantage of multiple cores. However before running 32 threads of execution on a single node, the Biowulf staff recommends that you benchmark your application to determine whether it can take advantage of hyperthreading or not. (Or even whether it scales to 16 cores!).

    Infiniband

    All LCP nodes are connected to a QDR Mellanox Infiniband switch (SX6512) at full-bisection bandwidth. No special properties need to be specified to sbatch to allocate an Infiniband node.

    Submitting jobs to the batch system

    Jobs are submitted to the LCP nodes by specifying the "lcp" partition. In the simplest case,

    sbatch --partition=lcp your_batch_script
    

    See slurm or application-specific documentation for suggestions on how to submit multi-node jobs.

    To specify a memory size,

    sbatch --constraint=g64 --partition=lcp your_batch_script
    
    or
    sbatch --constraint=g32 --partition=lcp your_batch_script
    

    Allocating an interactive node (note: for interactive sesssions specify lcp as a constraint rather than a partition),

    sinteractive --constraint=lcp
    

    To submit to swarm,

    swarm -f command_file --partition=lcp
    

    Core Limits

    There are effectively no per user core limits at this time, however there are walltime limits:

    $ batchlim
    
    Max Jobs Per User: 10000
    Max Array Size: 1001
     
    Partition        MaxCPUsPerUser     DefWalltime     MaxWalltime
    ---------------------------------------------------------------
    lcp                   9999          10-00:00:00     10-00:00:00
    

    Node Availablity

    While LCP users have priority access to the LCP nodes, they will be accessible by other Biowulf users by virtue of the existence of a "short" queue (not yet implemented). Nodes not in use by LCP users may be allocated for short queue jobs for up to 4 hours. That is, no LCP job will be queued for more than 4 hours waiting for nodes allocated to short queue jobs.

    To see how many nodes of each type are available use the freen command; there is now a separate section which reports the number of available LCP nodes:

    $ freen
                                               ........Per-Node Resources........  
    Partition    FreeNds       FreeCPUs       Cores CPUs   Mem   Disk    Features
    --------------------------------------------------------------------------------
    lcp         100/103      3200/3296         16    32     31g   400g   cpu32,core16,g32,sata400,x2600,ibddr,lcp
    lcp         92/103       2944/3296         16    32     62g   400g   cpu32,core16,g64,sata400,x2600,ibddr,lcp
    

    Performance

    The figure below shows performance on a GROMACS benchmark. The LCP nodes (x2600 QDR) have nearly twice the performance as the previous generation of Infiniband nodes running on Biowulf (e2800 DDR). DDR benchmark was run with 8 processes per node; QDR with 16 processes per node.

    Please send questions and comments to staff@hpc.nih.gov