All systems are operational

About This Site

Welcome on the T2B Cluster status page.

Please find status information about critical T2B cluster components, incidents and planned maintenance.

Mail subscription is available to get a notification when a component status change.

Past Incidents

Wednesday 26th June 2019

No incidents reported

Tuesday 25th June 2019

Batch System Incident with cooling at Computing Center

There is an unknown issue with the cooling in the datacenter room. Therefore a big part of the worker nodes were forcefully stopped. Some of your jobs will have therefore failed.

If the cooling comes back, nodes will be restarted tomorrow morning. If it continues failing, the temperature will force us to stop the batch system and eventually the storage. We will update this incident regularly.

  • Last Friday evening, intervention on the electrical board for the cooling system fixed the issue. As the temperature was stable during the weekend, it was convened with the datacenter operators that we could bring back all our machines.

    The batch system is now again at full capacity.

  • The issue with the cooling has been diagnosed today as an electrical issue. As tentative to fix it will only be tried tomorrow, and in view of the heat-wave of Saturday, we will keep the batch system at half-capacity as it is right now for the weekend.

    Situation will be re-evaluated Monday morning.

  • Because of the temperature, the /pnfs headnode went down. It is now back so /pnfs should be accessible.

    Cooling was only partially restored, so only a minimal number of job slots are available. Unless situation starts to worsen again, we will keep mX-machines and storage up. Complete availability of the batch system tomorrow will depend on whether the cooling machine can be fixed.

  • Monday 24th June 2019

    No incidents reported

    Sunday 23rd June 2019

    No incidents reported

    Saturday 22nd June 2019

    No incidents reported

    Friday 21st June 2019

    No incidents reported

    Thursday 20th June 2019

    No incidents reported

    Wednesday 19th June 2019

    User Interfaces - mX machines M9 is down

    M9 did not come back correctly after a reboot. An intervention on site is needed.

  • M9 is back online after a successful reboot.