Some systems are experiencing issues

About This Site

Welcome on the T2B Cluster status page.

Please find status information about critical T2B cluster components, incidents and planned maintenance.

Mail subscription is available to get a notification when a component status change.

Past Incidents

Saturday 10th December 2022

No incidents reported

Friday 9th December 2022

Mass Storage (/pnfs) One disk server is down

Hi,

One of our storage node for /pnfs went down as of ~10AM today 09/12. Our usual efforts to remotely bring it back online have failed. It requires manual intervention on site to power it on or understand the problem. We will try to have it done today, and will update this ticket when it is fixed, if not it might have to wait until monday.

What that means for you users, is that some of your files (~8%) will not be accessible. In case that is relevant for you, the machine in question is behar194.

Sorry for the troubles, Romain

  • The disk from the failing server have been incorporated into another enclosure and all the files are available again.

    However, due to the many pending requests to this server, access is still slow. We are keeping an eye on this.

    Regards,

    Olivier For the T2B team.

  • Unfortunately the new power system did not fix the issue on the server. In order to make sure you have access to your data as fast as possible, we have changed plans and will not wait for them to find the issue.

    We are going to move all the disks from the failing server to an identical one we've just emptied. The vendor are going to bring us back the disks tomorrow, and hopefully by Friday your data will be again available.

  • Vendor has finally an estimate of when we will get back the server in working order. Hopefully we will get the server back into production by next Wednesday evening.

    Sorry for the very long delay. Despite having a express-delivery maintenance contract, we are not happy at all with how long it took the company to find a replacement component. We will address this issue with them and keep you updated.

  • The vendor has identified the component that made the server unbootable. The component has been requested so that they can install and check it works fine before sending back the server to us.

    Unfortunately this means this will not happen before the Christmas break. If it were to arrive between Christmas and New Year, we will do our best to put it back into production at this time.

    Sorry for all the troubles and the delay !

  • Short update: we are in contact with the vendor to diagnose what the issue could be, and for them to eventually send a replacement piece.

    In the meantime, send us a list of files/directories and we will tell you which ones are inaccessible.

    Sorry for the troubles.

  • Unfortunately the server does not want to boot up even manually. The company has been contacted to provide on-site expertise.

  • Thursday 8th December 2022

    No incidents reported

    Wednesday 7th December 2022

    No incidents reported

    Tuesday 6th December 2022

    No incidents reported

    Monday 5th December 2022

    No incidents reported

    Sunday 4th December 2022

    No incidents reported