1.12.21, 1:20 PM
Systems are back up and should be working as usual. Please open a ticket at https://support.massopen.cloud/ if you have any issues.
There was a power sag at our data center, the MGHPCC, overnight and some hosts/nodes/servers lost power. We are looking to bring up things as soon as possible.
We will send an update as the situation becomes clearer.
The MOC is running without any service interruptions. Please note, however, that the MOC will not be staffed as usual from December 24, 2020 – January 3, 2021. Please submit a ticket for support, but anticipate a delay in a response.
The primary storage for the MOC clusters is on a ceph based cluster of computers and storage. While ceph is fault-tolerant, you should always regularly make backups of your information.
Next week, on Tuesday, December 8th, we plan to upgrade the ceph software to a more recent version.
While this should be invisible to you, we urge you to make sure you perform backups of your information (data, program files, anything you cannot afford to lose) and store it someplace other than on the MOC.
We recently identified a misconfiguration in the Mass Open Cloud support center software which meant that some email did not create tickets for our team to respond to.
If you are receiving this email it is because a request for assistance you intended to create did not come to our attention. I hope you will accept our apologies.
While the misconfiguration has been resolved (we enabled a feature to block unknown mail accounts) it is best if you get in the habit of opening tickets at support.massopen.cloud in case we are ever forced to enable that feature again due to spam.
Again, our apologies for any inconvenience. If you are still running into the issue please open a ticket at support.massopen.cloud.
The MOC will not be staffed from Wednesday November 25 evening through Monday November 30 morning. Please text Jen @ 781-308-1730 for emergencies only. Otherwise, please anticipate a delay in responses.
The MOC is fully operational.
TLDR: All Kaizen Openstack and Openshift are operational. Power 9 clusters are not reachable – target is early next week for that to be fixed.
TSWM (Too Short Want More): We are happy to report that the Kaizen OpenStack and Kaizen OpenShift clusters are operational.
Issues with networking have been resolved. If you run into any unusual behaviors please open a ticket at support.massopen.cloud or by sending email to email@example.com. These tickets are reviewed by 3pm Monday through Friday unless there is a business holiday.
The switch supporting the Power 9 cluster did not restart properly and we will be working to recover it over the weekend and early next week.
Also, we will be adding NVME’s to our ceph cluster over the next several weeks – MOC users should not notice any impact but we wanted to share the information in case you notice any odd behaviours.
We were making good progress towards our goal of having the MOC fully back online today (October 22). Unfortunately, we experienced network problems late in the day which we are working to debug.
While many services are up, we cannot guarantee your projects will not be affected. This may affect floating IPs, VMs and Kaizen OpenShift.
The Power 9 cluster is not yet available, as well.
We apologize for the inconvenience and will provide further updates in the morning.
We have two upcoming MOC service windows coming up.
The first one, which we do not expect to affect users, will be on October 15th, 2020.
The second one, which will affect all MOC users, is that the MGHPCC, which houses the MOC, undergoes scheduled annual maintenance beginning on October 20th.
Please shut down your VM’s, containers, and any bare metal systems by 9 AM on Monday the 19th, so that the Mass Open Cloud team may begin preparing for the shutdown process. If you do not shut them down yourself, you run the risk of your VM’s or containers losing data.
- MOC Downtime: Monday, October 19th at 9 am through Thursday October 22 at 5pm.
- The MOC has dependencies on several services which also run at the data center. Based on previous experience we recommend not scheduling critical events the week of the 19th.
When the MOC is returned to service it will be your responsibility to restart your VM’s and containers. We will update the website, massopen.cloud, as well as sending email to this distribution list.
Please make sure to get any data that you may need, and any backups you may need, off of the MOC in
advance, the data center will be completely without power making access to the cluster mid-maintenance impossible.
As always, feel free to send any questions or concerns to firstname.lastname@example.org. During the outage this email address will not connect to our ticketing system, we have set up email@example.com for use during the outage.
To get updates like this (eg. if this email was forwarded to you) you may sign up here: https://mail.massopen.cloud/mailman/listinfo/kaizen-users