Upcoming System Maintenance 4/19/2021
While we have resolved the service outage caused by the recent maintenance on our OpenStack environment, we are working to identify the root cause of the problem. To diagnose the failure, we will repeat the maintenance work on Monday, April 19th, at 9am (which is a BU holiday).
The Kaizen cluster and k-openshift have been restored to service.
Kaizen OpenStack is currently down we are working to resolve the issue.
We are performing some routine maintenance on the Kaizen cluster in order to update the expired SSL certificates. We do not expect any service interruption, but similar maintenance work in the past has trigged unexpected problems.
A network slowdown has impacted several projects. To repair the problem will require physical maintenance this Friday, April 9th, between 1 and 3 pm.
K-Openshift and Kaizen Openstack will not be available during that time. We may need a few hours to bring the MOC back online after the repair. If a lack of network access may impact your project, you should consider pausing it before the maintenance work starts.
We do not expect the Power 9 or Zero clusters to be affected. Nevertheless, it is probably a good idea not to plan critical work on the afternoon of Friday, April 9th.
As always, please make sure you have local backups – we apologize; they will be slow due to this problem.
A quick update on the status of OpenStack Swift/S3 in the Mass Open Cloud. We have repaired the problem. We are still repairing the ceph cluster from which Openstack Swift gets its storage, but you should be able to use it.
As always, please remember that the MOC is a research cluster, and you should always make sure you have backups.
The power outage on 3.29 led to an s3 outage on kaizen openstack. We do not yet have an eta for resolution.
If you have any problems please open a ticket here.
3.29.21 – 12 PM
There was a power outage at the MOC data center today which affects all of the MOC systems.
We will provide an update on an eta for restoring service as soon as one is available here and at https://massopen.cloud.
We are experiencing a problem with mail.massopen.cloud which affects account signups, adding members to existing OpenStack projects, and receiving mail from our support system.
We will check the account signups and request to add members manually each afternoon and send email notifying you that projects and people have been added. We will update this page when the issue is resolved.
The MOC is fully operational.
The Kaizen Openshift Cluster – k-openshift – is experiencing an outage.
1.12.21, 1:20 PM
Systems are back up and should be working as usual. Please open a ticket at https://support.massopen.cloud/ if you have any issues.
There was a power sag at our data center, the MGHPCC, overnight and some hosts/nodes/servers lost power. We are looking to bring up things as soon as possible.
We will send an update as the situation becomes clearer.
The MOC is running without any service interruptions. Please note, however, that the MOC will not be staffed as usual from December 24, 2020 – January 3, 2021. Please submit a ticket for support, but anticipate a delay in a response.
The primary storage for the MOC clusters is on a ceph based cluster of computers and storage. While ceph is fault-tolerant, you should always regularly make backups of your information.
Next week, on Tuesday, December 8th, we plan to upgrade the ceph software to a more recent version.
While this should be invisible to you, we urge you to make sure you perform backups of your information (data, program files, anything you cannot afford to lose) and store it someplace other than on the MOC.
We recently identified a misconfiguration in the Mass Open Cloud support center software which meant that some email did not create tickets for our team to respond to.
If you are receiving this email it is because a request for assistance you intended to create did not come to our attention. I hope you will accept our apologies.
While the misconfiguration has been resolved (we enabled a feature to block unknown mail accounts) it is best if you get in the habit of opening tickets at support.massopen.cloud in case we are ever forced to enable that feature again due to spam.
Again, our apologies for any inconvenience. If you are still running into the issue please open a ticket at support.massopen.cloud.
The MOC will not be staffed from Wednesday November 25 evening through Monday November 30 morning. Please text Jen @ 781-308-1730 for emergencies only. Otherwise, please anticipate a delay in responses.
The MOC is fully operational.
TLDR: All Kaizen Openstack and Openshift are operational. Power 9 clusters are not reachable – target is early next week for that to be fixed.
TSWM (Too Short Want More): We are happy to report that the Kaizen OpenStack and Kaizen OpenShift clusters are operational.
Issues with networking have been resolved. If you run into any unusual behaviors please open a ticket at support.massopen.cloud or by sending email to firstname.lastname@example.org. These tickets are reviewed by 3pm Monday through Friday unless there is a business holiday.
The switch supporting the Power 9 cluster did not restart properly and we will be working to recover it over the weekend and early next week.
Also, we will be adding NVME’s to our ceph cluster over the next several weeks – MOC users should not notice any impact but we wanted to share the information in case you notice any odd behaviours.
We were making good progress towards our goal of having the MOC fully back online today (October 22). Unfortunately, we experienced network problems late in the day which we are working to debug.
While many services are up, we cannot guarantee your projects will not be affected. This may affect floating IPs, VMs and Kaizen OpenShift.
The Power 9 cluster is not yet available, as well.
We apologize for the inconvenience and will provide further updates in the morning.
We have two upcoming MOC service windows coming up.
The first one, which we do not expect to affect users, will be on October 15th, 2020.
The second one, which will affect all MOC users, is that the MGHPCC, which houses the MOC, undergoes scheduled annual maintenance beginning on October 20th.
Please shut down your VM’s, containers, and any bare metal systems by 9 AM on Monday the 19th, so that the Mass Open Cloud team may begin preparing for the shutdown process. If you do not shut them down yourself, you run the risk of your VM’s or containers losing data.
- MOC Downtime: Monday, October 19th at 9 am through Thursday October 22 at 5pm.
- The MOC has dependencies on several services which also run at the data center. Based on previous experience we recommend not scheduling critical events the week of the 19th.
When the MOC is returned to service it will be your responsibility to restart your VM’s and containers. We will update the website, massopen.cloud, as well as sending email to this distribution list.
Please make sure to get any data that you may need, and any backups you may need, off of the MOC in
advance, the data center will be completely without power making access to the cluster mid-maintenance impossible.
As always, feel free to send any questions or concerns to email@example.com. During the outage this email address will not connect to our ticketing system, we have set up firstname.lastname@example.org for use during the outage.
To get updates like this (eg. if this email was forwarded to you) you may sign up here: https://mail.massopen.cloud/mailman/listinfo/kaizen-users