NIH HPC News & Announcements
UPDATE: Biowulf/HPC system status
Date: 21 January 2023 00:01:38
From: Tim Miller
HPC Users,
At about 2:30 PM Friday afternoon, there was a major network problem on
the internal HPC network that serves the Biowulf cluster, Helix, Globus,
and HPCdrive. This caused problems with the HPC storage systems. While
the network problems were being worked on, HPC administrators brought
many HPC storage systems down in order to ensure the integrity of user data.
The network problems were abated, and then HPC administrators carefully
checked the file systems and networks to ensure that there were no
further immediate issues. That process is now complete and we have
re-enabled user access to Helix, Biowulf, Globus, and HPCdrive.
However, HPC administrators need to monitor the network for a period of
time to ensure stability and ensure that the problem that occurred this
afternoon will not recur. Therefore, while interactive jobs have been
re-enabled, batch job submission remains disabled. We will be monitoring
the network and storage systems carefully, and we expect to be able to
resume batch scheduling by Saturday evening. We will inform users when
batch scheduling is restarted or if there is any change to this schedule.
We are very aware how disruptive this problem was. We will be thoroughly
investigating the root cause of the problem and take whatever steps are
required to minimize the chance of a re-occurrence.
As always, we appreciate your patience and understanding.
########################################################################
Please contact staff@hpc.nih.gov with any questions about the NIH HPC Systems
[Last 12 months of HPC announcements]