All systems are operational

About This Site

Welcome on the T2B Cluster status page.

Please find status information about critical T2B cluster components, incidents and planned maintenance.

Mail subscription is available to get a notification when a component status change.

Past Incidents

Friday 24th January 2025

Mass Storage (/pnfs) issues with slow pnfs affecting all machines

Hello,

Unfortunately standard access to /pnfs have been slow lately. To fix this, we have been forced to restart the nfs service for pnfs.

The consequence to that is that many of our machines have lost access to /pnfs entirely. We are busy fixing that, and might have to restart mX machines.

More info will be given asap.

Sorry for the troubles, The T2B Admin Team

  • Unfortunately pnfs is still very unstable.

    Restarting the services or even rebooting the whole machine does not help.

    We're still investigating what could cause the instability.

  • We had to restart again all the pnfs system, and that fixed everything. /pnfs is accessible again from all mX machines without a need to reboot them. A lot of worker nodes went down because of the issue unfortunately, we have remotely restarted as many as possible.

    Current cluster capacity: EL7 - free: 420 + run: 3499 [1368+2131] + drain: 0 = 3919 EL9 - free: 28 + run: 5220 [5026+194] + drain: 0 = 5248

  • Thursday 23rd January 2025

    No incidents reported

    Wednesday 22nd January 2025

    No incidents reported

    Tuesday 21st January 2025

    No incidents reported

    Monday 20th January 2025

    No incidents reported

    Sunday 19th January 2025

    No incidents reported

    Saturday 18th January 2025

    No incidents reported