Some systems are experiencing issues

About This Site

Welcome on the T2B Cluster status page.

Please find status information about critical T2B cluster components, incidents and planned maintenance.

Mail subscription is available to get a notification when a component status change.

Stickied Incidents

Monday 12th August 2024

Mass Storage (/pnfs) Some issues with mass storage /pnfs [Rucio & Crab]

Hello,

Several users have reported that:

1/ Rucio does not allow copies to our RSE with error: Details: RSE excluded; not available for writing.

2/ Crab also complains that tasks can't be started because you are not allowed to write on your home directory on our site: Checkwrite Result: Unable to check write permission in /store/user/rougny on site T2_BE_IIHE

We are investigating both issues.

On the other hand, standard grid commands on your files (eg gfal-copy) seem to work without any issues.

Cheers, Romain

  • Dear all,

    After consulting with central CMS IT services, it seems that they have resolved the problem from their end. We also received confirmation from users that the rucio and crab indeed work as expected again.

    Kind regards,

    Olivier For the T2B Admin team

  • Past Incidents

    Saturday 18th November 2023

    No incidents reported

    Friday 17th November 2023

    No incidents reported

    Thursday 16th November 2023

    No incidents reported

    Wednesday 15th November 2023

    No incidents reported

    Tuesday 14th November 2023

    No incidents reported

    Monday 13th November 2023

    Batch System Downtime Friday 17/11 4PM to Monday 20/11 8PM

    Dear Users,

    The timeline for the downtime has been finalized by the team working on the building.

    The downtime will officially be from Friday 17/11 4PM to Monday 20/11 8PM. The compute nodes will all be stopped Friday around 4PM. Jobs will stop running, and we expect them to be rescheduled and queued.

    During the downtime, all storage (/pnfs, /user) will stay accessible, as well as mX machines and services. Batch system will receive jobs, but none will be able to start until compute nodes are started again.

    While the downtime extends up to Monday evening, it is possible, if everything goes well, that work will be finished on Sunday. So compute nodes might be started early, please follow status.iihe.ac.be for follow-ups.

    Please note that as the works include touching at the electric lines delivering power to the Datacenter, while failover equipments and procedures are implemented, there is a small risk that all power to the cluster will be cut. In that case, all storage and services would come down.

    Cheers, The T2B IT Team

  • All Worker Nodes are back online. Downtime is officially finished. You can start sending jobs again.

  • All Worker Nodes have been stopped.

    Because of self-cleaning, /user might also be slow during the weekend.

    From the Technical Team, it seems operations will still happen on Monday, after which hopefully cluster will be restarted.

  • Sunday 12th November 2023

    No incidents reported