NIH HPC News & Announcements
UPDATE: Biowulf/HPC system status
Date: 21 January 2023 00:01:38From: Tim Miller
HPC Users, At about 2:30 PM Friday afternoon, there was a major network problem on the internal HPC network that serves the Biowulf cluster, Helix, Globus, and HPCdrive. This caused problems with the HPC storage systems. While the network problems were being worked on, HPC administrators brought many HPC storage systems down in order to ensure the integrity of user data. The network problems were abated, and then HPC administrators carefully checked the file systems and networks to ensure that there were no further immediate issues. That process is now complete and we have re-enabled user access to Helix, Biowulf, Globus, and HPCdrive. However, HPC administrators need to monitor the network for a period of time to ensure stability and ensure that the problem that occurred this afternoon will not recur. Therefore, while interactive jobs have been re-enabled, batch job submission remains disabled. We will be monitoring the network and storage systems carefully, and we expect to be able to resume batch scheduling by Saturday evening. We will inform users when batch scheduling is restarted or if there is any change to this schedule. We are very aware how disruptive this problem was. We will be thoroughly investigating the root cause of the problem and take whatever steps are required to minimize the chance of a re-occurrence. As always, we appreciate your patience and understanding. ######################################################################## Please contact staff@hpc.nih.gov with any questions about the NIH HPC Systems[Last 12 months of HPC announcements]