Some systems are experiencing issues

About This Site

Welcome on the T2B Cluster status page.

Please find status information about critical T2B cluster components, incidents and planned maintenance.

Mail subscription is available to get a notification when a component status change.

Past Incidents

Tuesday 23rd July 2024

No incidents reported

Monday 22nd July 2024

No incidents reported

Sunday 21st July 2024

No incidents reported

Saturday 20th July 2024

No incidents reported

Friday 19th July 2024

Batch System downtime 22/07

Hello,

Because of the heat wave and the ongoing cooling works in the datacenter, part of the worker nodes have been shut down. You still have access to ~5700 job slots.

REMINDER DOWNTIME: the batch system, /pnfs and the mX machines will be stopped after midnight monday 22/07 morning.

Cheers, Romain

  • Hello,

    The cooling in the datacenter seems finally under control, so we were allowed to restart the /pnfs mass storage servers and the batch system.

    Note that today only ~3500 jobs slots have been started. More compute capacity will be added in steps to make sure the cooling can keep the charge.

    As we took the opportunity to perform several software upgrades, do not hesitate to inform us if anything does not work as expected !

    Cheers, The T2B IT Team

  • Hello,

    Unfortunately the maintenance on the cooling is still ongoing. Apparently since it's holiday time they are having a harder time to get experts on site in a timely fashion.

    We have requested from them to at least be able to start the /pnfs mass storage system. Hopefully we will get a positive response from them by tomorrow.

    Cheers, The T2B IT Team

  • Hello,

    The datacenter is still ongoing work on the cooling units. They hope to have it fixed today but will need the weekend to make sure it is stable. This means unfortunately we allowed to start any machine today.

    In light of this, we have decided to open up the mX machines, but you will NOT have access to:

    • /pnfs
    • the batch system (so no condor_* commands)

    We are very sorry for the disagreement, The T2B IT Team

  • Hello,

    Unfortunately the datacenter is still experiencing cooling issues and is not stable, so we cannot put any machine online. We are exchanging information with the people managing the datacenter, and will inform you as soon as the situation changes.

    Cheers, Romain

  • Hello,

    As expected the site is now under maintenance. Nothing will be accessible until further notice.

    We'll try to finish things as soon as possible.

    Cheers, Romain

  • Thursday 18th July 2024

    No incidents reported

    Wednesday 17th July 2024

    No incidents reported