MOC Research Production (Kaizen)

 

Overview

Kaizen, MOC’s production OpenStack cluster, provides a cloud platform for the research and development projects being worked on by the MOC and its outside partners.

In the spring of 2016, our cloud hosted projects for approximately 90 students from our Cloud Computing courses at Boston University and Northeastern University.  The class was offered again in Spring 2017 to 40 students at Boston University. It is also home to small number of longer-term projects owned by researchers at our partner institutions.  Fun fact – one of those projects is this very website, which is served by a small OpenStack instance.

Kaizen means ‘continuous improvement’ in Japanese, and we chose this name to reflect the MOC’s emphasis on innovation. Two core MOC projects, HIL and MOCMon, had been integrated into Kaizen. Many other projects currently being developed on top of the existing cloud will be incorporated into the production cloud once they are ready for release.  In this way Kaizen perpetuates a cycle of improvement – each new project that is incorporated adds new features to the cloud, which in turn enables more opportunities for future development and research.

To see Kaizen in action, check out our Video Tutorials. For those who prefer text to video, we also maintain a text-based tutorial on our public wiki.


Architecture

Kaizen is deployed on 32 compute servers and two redundant controller nodes, with an additional services node to host various tools for monitoring and deployment. It can be expanded to include more nodes from the cluster, which comprises 48 Cisco UCS C220 M3 servers.  All of these servers are located in one of Northeastern University’s pods at the Massachusetts Green High Performance Computing Center (MGHPCC) in Holyoke, MA.

The cluster is currently running OpenStack Mitaka on Red Hat Enterprise Linux OpenStack Platform.  Server configuration and OpenStack deployment on the cluster is heavily automated using Puppet.  Our production puppet scripts are hosted publicly on GitHub.

Kaizen OpenStack uses Ceph as a storage backend with a Fujitsu CD10000 appliance providing 136 TB storage. We also expose a Swift endpoint via the RADOS Gateway client.


 Diagrams

NUTopologyForWebsite

NUCD10kForWebsite


Project Team

Core Project Team

  • Radoslav Milanov, Senior Infrastructure Engineer (Boston University) 
  • Laura Kamfonik, Junior Infrastructure Engineer (Boston University)  
  • Duaa Tashkandi, Intern (Boston University)
  • Lily Sturmann, Intern (Boston University)
  • Piyanai Saowarattitada, MOC Director of Engineering and Infrastructure

Contributors

  • MOC Core Team
  • Rajiv Shridhar, Director – System & Production Services (Northeastern University)  
  • Anand Dhingra, System & Production Services (Northeastern University)  
  • Nilay Roy, System & Production Services (Northeastern University)
  • Brent Holden, Field Chief Technologist (Red Hat Enterprise Linux)
  • Jonathan Proulx, Senior Technical Architect (MIT CSAIL)

Timeline

  • April 2015 – Infrastructure began with crucial NEU IT networking (Anand Dhinga) and storage (Nilay Roy) in place to collaborate with the MOC deploying on cluster design and implementation. For starter, NEU provided 129.10.3.0/25 and 129.10.3.248/29 IP addresses for the deployment.
  • June 2015 – Work begins on installing Red Hat and OpenStack on the current infrastructure.  MOC welcomes a big crew of summer interns.
  • July through October 2015 – Collaborated with our MIT CSAIL partner to explore the possibility of creating a two-region community cloud on a Galera cluster having Kaizen as one of the two regions. The collaboration was put on hold due to MIT CSAIL resource issue.
  • November 2015 – The Kaizen production cluster is officially deployed, just in time for an MOC workshop on November 19, hosted at BU.
  • December 2015 – Kaizen gets its first real users, a handful of researchers working for our partner universities and MGHPCC.
  • January 2016 – Automating and Integrating Sensu for monitoring. NEU provided additional 129*/26 IP address set to be used for the Cloud Computing classes .
  • February 2016 – About 90 student users from BU and NEU join the cluster. Provided how-to training sessions to student users.
  • March through April 2016 – Improving Kaizen
    • Sensu/Grafana/Influxdb for monitoring
    • Security including but not limited to regular updates and Suricata to monitor VM traffic.
    • Scheduled incremental backup.
    • Exploring professional security auditing service.
  • Summer/Fall 2016
    • Deploy BMI to elastically allocate OpenStack compute nodes at will
    • Deploy additional services: Heat (orchestration) and Sahara (Hadoop cluster provisioning)
    • Deploy second controller node for redundancy
    • Utilize the larger set of /22 subnet made available by the MIT/CSAIL team
    • OpenStack Liberty upgrade
  • Fall/Winter 2016
    • Finish inter-pod communication 
    • Scoping possible solutions for instance backups
    • Exploring tools to automate operating system level syslog/crontab tools such as Logwatch
    • Scoping a possible need of hardware/firmware vulnerabilities and known solutions e.g. TPM
  • Spring 2017
    • OpenStack Mitaka upgrade
    • Development of automation tools to handle requests for new OpenStack accounts, account changes, etc.
    • Partnership with the University of Massachusetts to provide helpdesk support to Kaizen users
    • OpenStack training for IT Partners at BU
    • Staging tests for upgrades to Newton and Ocata
    • May 23: Equipment Upgrades during MGHPCC Planned Outage
  • Summer 2017 (Planned)
    • OpenStack Newton upgrade
    • Improvements to new user/project automation tools
    • Exploring HIPAA in Kaizen

Planning and Getting Involved

To get involved in this project, please send email to (MOC team-list) and/or join the #moc irc channel on freenode.

Leave a Reply

Your email address will not be published. Required fields are marked *