For the last few days, /pnfs is having performance issues, while some storage nodes can't serve files.
We have not identified the cause yet.
Mainly cd/ls becomes slow, and only a restart of the service makes it work fast again.
Also, a couple of storage nodes have a hard time serving files (hence why some of your files have issues), this usually solves itself after a while.
Cheers,
The T2B IT Team
Past Incidents
Friday 26th July 2024
No incidents reported
Thursday 25th July 2024
No incidents reported
Wednesday 24th July 2024
No incidents reported
Tuesday 23rd July 2024
No incidents reported
Monday 22nd July 2024
No incidents reported
Sunday 21st July 2024
No incidents reported
Saturday 20th July 2024
No incidents reported
Friday 19th July 2024
Batch Systemdowntime 22/07
Hello,
Because of the heat wave and the ongoing cooling works in the datacenter, part of the worker nodes have been shut down.
You still have access to ~5700 job slots.
REMINDER DOWNTIME: the batch system, /pnfs and the mX machines will be stopped after midnight monday 22/07 morning.
Cheers,
Romain
Hello,
The cooling in the datacenter seems finally under control, so we were allowed to restart the /pnfs mass storage servers and the batch system.
Note that today only ~3500 jobs slots have been started. More compute capacity will be added in steps to make sure the cooling can keep the charge.
As we took the opportunity to perform several software upgrades, do not hesitate to inform us if anything does not work as expected !
Cheers,
The T2B IT Team
Hello,
Unfortunately the maintenance on the cooling is still ongoing. Apparently since it's holiday time they are having a harder time to get experts on site in a timely fashion.
We have requested from them to at least be able to start the /pnfs mass storage system. Hopefully we will get a positive response from them by tomorrow.
Cheers,
The T2B IT Team
Hello,
The datacenter is still ongoing work on the cooling units. They hope to have it fixed today but will need the weekend to make sure it is stable.
This means unfortunately we allowed to start any machine today.
In light of this, we have decided to open up the mX machines, but you will NOT have access to:
/pnfs
the batch system (so no condor_* commands)
We are very sorry for the disagreement,
The T2B IT Team
Hello,
Unfortunately the datacenter is still experiencing cooling issues and is not stable, so we cannot put any machine online.
We are exchanging information with the people managing the datacenter, and will inform you as soon as the situation changes.
Cheers,
Romain
Hello,
As expected the site is now under maintenance. Nothing will be accessible until further notice.