NIH HPC News & Announcements
IMPORTANT: Changes to Biowulf Batch system coming Oct 31
Date: 24 October 2016 11:10:18
From: steven fellini
This is an IMPORTANT message for all Biowulf Users.
Over the first week of November 2016, an additional 1080 compute nodes
(30,000 cores or 60,000 CPUs) will be added to the Biowulf cluster. In
preparation for this expansion, the Biowulf batch queuing system will
be reconfigured on the morning of Monday October 31. The addition of
these compute resources will allow us to significantly raise the
per-user cpu limit for all users over the course of the week.
The batch system changes will allow the expanded system to more
efficiently schedule jobs and are based on our experience to date
with SLURM as well as users' input.
The reconfiguration will require that all running and queued jobs be
deleted. The Biowulf login node and all filesystems will remain
available during the reconfiguration.
The changes include one new partition and the elimination of three, as
well as revised default and maximum timelimits (DefWalltime,
MaxWalltime). Users currently specifying a "--partition" option to
their 'sbatch' or 'swarm' commands may need to change the way they
submit their jobs.
norm partition (default). CHANGED
---------------------------------
DefWalltime reduced from 4 hours to 2 hours, MaxWalltime remains 10 days.
NEW: the norm partition is restricted to running single-node jobs
multinode partition. NEW
------------------------
This partition is intended for parallel jobs that require 2 or more nodes;
single-node jobs will not be allowed to run on this partition.
DefWalltime is 8 hours; MaxWalltime 10 days. All nodes in the multinode
partition will be connected to an FDR Infiniband network.
Users with short (< 8 hrs walltime) multinode jobs can also take advantage of
the new 'turbo' QoS (Quality of Service). This QoS will have a substantially
increased MaxCPUSPerUser and Priority. Add '--qos=turbo' to your sbatch
command to use this QoS.
b1 partition. ELIMINATED
------------------------
The b1 nodes will be merged into the quick partition.
ibfdr, ibqdr partitions. ELIMINATED
-----------------------------------
ibfdr nodes will be merged into the multinode partition. ibqdr nodes will be
merged into the quick partition (but will retain IB connectivity, use
--constraint=ibqdr).
quick partition. CHANGED
------------------------
DefWalltime reduced from 2 hours to 1 hour, MaxWalltime remains 2 hours.
interactive, largemem, unlimited, gpu, ccr, ccrclin, niddk, nimh partitions.
----------------------------------------------------------------------------
UNCHANGED.
You can use the 'batchlim' command to determine the current values for
the maximum number of cpus per user for each partition.
Details on these new new nodes are available at https://hpc.nih.gov/systems
For a summary of the batch system changes please visit
https://hpc.nih.gov/docs/batch_changes_oct2016.html
[Last 12 months of HPC announcements]