Incident with cooling at Computing Center Tuesday 25th June 2019 17:15:00


There is an unknown issue with the cooling in the datacenter room. Therefore a big part of the worker nodes were forcefully stopped. Some of your jobs will have therefore failed.

If the cooling comes back, nodes will be restarted tomorrow morning. If it continues failing, the temperature will force us to stop the batch system and eventually the storage. We will update this incident regularly.

Last Friday evening, intervention on the electrical board for the cooling system fixed the issue. As the temperature was stable during the weekend, it was convened with the datacenter operators that we could bring back all our machines.

The batch system is now again at full capacity.

The issue with the cooling has been diagnosed today as an electrical issue. As tentative to fix it will only be tried tomorrow, and in view of the heat-wave of Saturday, we will keep the batch system at half-capacity as it is right now for the weekend.

Situation will be re-evaluated Monday morning.

Because of the temperature, the /pnfs headnode went down. It is now back so /pnfs should be accessible.

Cooling was only partially restored, so only a minimal number of job slots are available. Unless situation starts to worsen again, we will keep mX-machines and storage up. Complete availability of the batch system tomorrow will depend on whether the cooling machine can be fixed.