MOC Marketplace Concept at the MOC – GIJI

Overview

The Mass Open Cloud (MOC) needs a simple intuitive GUI for its public cloud marketplace. Our goal is to provide a tool to enhance the user experience by automating mundane but complicated steps to use cloud resources as well as presenting to our users all available cloud services including but not limited to network routing and ability to choose between a virtual machine (OpenStack instances) and container technologies. 

Motivation

OpenStack Horizon is inadequate to serve naive users who lack basic cloud knowledge and to whom the cloud is simply a tool to get computing results for their scientific research for example. This MOC marketplace GUI serves to be a simpler tool streamlining the cloud experience. It can be used by itself or alongside Horizon. Developing a new GUI from scratch can be both time-consuming and challenging. A collaboration with the Cyverse’s Atmosphere/Troposphere project has helped accelerate the development of the MOC GUI. 

As part of the Open Cloud Exchange, Cloud Dataverse, OpenShift, and NetEx are being integrated as services.   From this effort, a standard Rest API will be developed which will be used by 3rd party developers to develop future services.


Architecture

Marketplace Architecture

flowchart giji-001

Current GUI Concept

In the marketplace, we will show content that is from the service developer to indicate its general classification.

 

screen-shot-2017-07-21-at-09-16-25

Upon selection of a particular service, the specific information needed to use that service will be displayed in a modal.

giji-005


Project Team

Core Project Team

  • Robert B. Baron  
  • Lucas H. Xu (Boston University) 

Timeline

  • Fall 2017: Support of NetEx
  • July 2017: Proof of Concept of using Slurm workload manager on GIJI
  • June 2017: Proof of Concept of HIL (Hardware Isolation Layer)
  • May 2017: Proof of Concept of Cloud Dataverse in MOC Simple GUI(GIJI).
  • March 2017: Presentation of MOC Simple GUI (GIJI) for OpenStack summit got accepted.
  • January 2017: Create MOC Simple GUI code base based on atmosphere on Github
  • July 2016: Contact Cyverse atmosphere team with Steve Gregory

Planning and Getting Involved

To get involved in this project, please send an email to (MOC team-list) and/or join the #moc IRC channel on Freenode or join our team on Slack.

Code is available at:

Site at:

 

Overview

We are proposing a new set of use cases for OpenStack and a set of changes to enable a multi-landlord cloud model, where multiple service providers can cooperate (and compete) to stand up services in a single cloud, and users can federate resources between various services in various ways—“mix and match.” This page describes the model and our plan for the Newton and Ocata cycles.

MixMatch NENS Poster

MOC Workshop ’16 Slides: Mix & Match: Resource Federation


Motivation

Currently, all clouds are stood up by a single company or organization, creating a vertically integrated monopoly. Any competition is between entire clouds and is limited by the customer’s ability to move their data over the connectivity between clouds. We think an alternative model is possible, where different entities can stand up individual services within the same data center and the customer (or intermediaries acting on their behalf) can pick and choose between them. We call this model of having multiple landlords in a cloud an Open Cloud eXchange (OCX).

The OCX model would enable more innovation by technology companies, enable cloud research and provide more choice to cloud consumers. We are developing this model in a new public cloud, the Mass Open Cloud (MOC), at the Massachusetts Green High Performance Computing Center (MGHPCC) data center, which is shared between Boston University, Harvard University, the Massachusetts Institute of Technology, Northeastern University and the University of Massachusetts. Some use cases being explored in the context of the MOC illustrate the potential of this model:

  • Harvard and MIT both stand up their own OpenStack cloud for students, but provide resources on-demand to the MOC that can be used by researchers that collaborate between the universities and by external users.
  • A storage company stands up a new innovative block storage service on a few racks in the MGHPCC, operates it themselves and makes it available to users of the MOC and/or the individual university clouds. The storage company is in total control of price, automation and SLA for the service; users can choose if they want to use the service.
  • A company stands up a new compute service with innovative hardware (e.g., FPGAs, crypto accelerator) or pricing model. A customer with a two Petabyte disk volume can switch to using that compute without having to move the data.
  • A research group from Boston University and Northeastern University develops a highly elastic compute service that can checkpoint thousands of servers into SSD  in seconds and broadcast provision a distributed application to allow an interactive medical imaging application that wants thousands of servers for a few seconds.
  • A security sensitive life sciences company exploits software from the MACS project to distribute their data across two storage services from non-colluding providers. The data is accessed from a Nova compute service running on a trusted compute platform developed collaboratively with Intel. Since all services are deployed in the same data center, the computation is efficient.
  • Students in a cloud computing course offered by Northeastern University and Boston University faculty develop and stand up an experimental proxy service for open stack services that provides users of the MOC a Swift service that combines the inventory of multiple underlying Swift services.

We believe solutions to the problems of the OCX model will improve OpenStack generally from a security and reliability perspective. Solving the problems of multiple providers/landlords that don’t trust each other also brings defense in depth for a single provider cloud; potentially preventing an exploit of one service from compromising an entire cloud.


Architecture

We are building on a feature added in the Kilo release of OpenStack called Keystone-to-Keystone federation.  With this feature, Keystone A can act as an identity provider for Keystone B, providing the user in the former with a SAML assertion that describes the user’s identity and permissions. Keystone B consumes this assertion and produces a token with which the user can query the services to access the resources.

Storage

The use case we are currently focusing for now, is to allow Nova to access volumes, images and snapshots in a service provider with Keystone-to-Keystone federation. For this, we have built a new OpenStack service that will act as a proxy to Cinder and Glance. Our new service will transparently forward API calls to the correct service provider after resolving the location of the resources queried, and performing the SAML exchange to get a token for the remote service.

Proxy_User
(1) User authenticates to Keystone and receives a token. (2) User asks nova to attach a volume. (3) Nova asks the Cinder endpoint (proxy) for the volume. (4) Proxy asks Keystone for a SAML assertion, which it uses (5) to receive a token for the remote cloud. (6) Finally the proxy forwards the request to the remote service.

Network

Work is planned towards allowing VMs in different clouds to connect to each other as if they had been launched on the same private network (without the need the need to assign public IP addresses to the instances). This is conducive to the mix and match of compute resources across providers. The solution will make use of VXLAN tunneling and be orchestrated through the mixmatch proxy service. (In the past, we looked at vendor SDN solutions which plug into OpenStack Neutron like Midonet and Contrail.)


Project Team

Core Project Team

  • Kristi Nikolla (Boston University)  
  • Eric Juma
  • Jeremy Freudberg 

Contributors

  • Sean Perry (HP)
  • Jon Bell (Boston University)
  • Davanum Srinivas (Mirantis)
  • Adam Young (Red Hat)
  • Minying Lu (Boston University)
  • George Silvis III (Boston University)
  • Wjdan Alharthi (Boston University)

Timeline

2017

  • January through June: External testing in production deployments for storage
  • May through August: Work on network federation implemented as a PoC

2016

  • July: Discussion with upstream Nova team about solution architecture
  • July through October 2016: Implementing proxy service
  • October: Presenting our solution in a vBrownBag lightning talk at the Ocata OpenStack Summit in Barcelona
  • December: First release of the proxy as an upstream OpenStack project

Planning and Getting Involved

To get involved, email (mail:team@lists.massopen.cloud) or join the #moc IRC channel on freenode.  You can also look at our current status on Trello, at https://trello.com/b/BQQFdyLx

Code is available at https://github.com/openstack/mixmatch

Documentation is available at http://mixmatch.readthedocs.io

 

Overview

The Big Data Testbed is a large, collaborative cluster which serves as a test site for a wide variety of innovative high performance computing technologies. It is located at the Massachusetts Green High Performance Computing Center (MGHPCC).

MOC’s industry partners in the Big Data project include Brocade, Intel, Lenovo and Red Hat. The MIT Research Computing


Architecture

The Big Data Testbed consists of 18 compute racks, one storage rack, and an experimental OpenStack rack.  MIT provides compute servers for the cluster, which are housed in MIT racks.  At scheduled times, MIT nodes are made available for provisioning via the MOC network, and then are moved back to the MIT compute pool when the time is up.  We are currently exploring rapid node deployment using Bare Metal Imaging (BMI). Future plans include moving nodes to between MIT and MOC more dynamically via requests to MIT’s provisioning system.

Each compute rack will eventually feature an Intel cache server with two high performance SSDs, configured as cache tier of the cluster’s Ceph storage backend. Currently 5 such servers are actively deployed.

The cluster is backed by Red Hat Ceph Storage.  The base Ceph tier, containing the majority of the storage capacity, is installed on ten Lenovo x3650 servers.  Each server is equipped with nine 4TB SATA drives, plus three faster SSDs for journaling.  The replicated nature of Ceph (three replication) means that the total working capacity of the base storage tier is about 103TB.  Three small Quanta servers serve as monitors for the cluster.

The cluster network is a special ‘bifurcated ring’ architecture designed for this project by engineers at Brocade.  The ring is designed to create short paths between any two points, yet still be expandable with minimum disruption to the existing infrastructure.  Currently, we have 22 Brocade VDX6740 switches in the cluster – one for each of the 18 compute racks, one in the OpenStack rack, and three in the storage rack.  The 10 servers are split among the 3 switches, so that the storage backend is accessible from 3 different points in the wider ‘ring’ architecture.

The last piece of the cluster is an MOC compute rack with 7 Quanta servers and 6 Lenovo System x3550 servers. This rack features a small OpenStack Liberty installation, an experimental OpenShift environment, and a multinode setup used for SDN research and an OpenFlow research project. This rack also houses the HIL master which controls networking for the Engage1 deployment.


Diagrams

Big Data Testbed Infrastructure Diagram

Engage1

RGW-Proxy Overview

RGWsSSDcacheInMOC


Project Team

Core Project Team

  • Chris Hill, EAPS Principal Research Scientist; Earth, Atmospheric, and Planetary Sciences (MIT)
  • Radoslav Milanov, Senior Infrastructure Engineer (Boston University)
  • Laura Kamfonik, Junior Infrastructure Engineer (Boston University) 
  • Paul Hsi (MIT)
  • Rahul Sharma, Co-op Intern (Boston University) 
  • Piyanai Saowarattitada, MOC Director of Engineering (Boston University) 

Contributors

  • Jon Bell (Boston University)
  • Dave Cohen (Intel)
  • Rob Montgomery (Brocade)
  • Mark Presti (Brocade)
  • Bob Newton (Lenovo)
  • Jon Proulx (MIT – CSAIL)
  • Garrett Wollman (MIT – CSAIL)
  • Tyler Brekke (Red Hat Enterprise Linux)
  • Huitae Kim (Red Hat Enterprise Linux)
  • Joe Fontecchio (University of Massachusetts) 

Timeline

  • July 2015
    • Engineers from Lenovo install the 10 storage servers at MGHPCC.
    • Extensive planning of the Brocade fabric is begun.
  • September 2015 – The 22 VDX6740 Switches are configured by Rob M. (Brocade), and later installed in the racks.
  • October 2015 – The Brocade Fabric is installed, with over 100 cables connecting switches across two datacenter pods.
  • November 2015 – 10 small Quanta servers are installed, which later become admin/service nodes, Ceph monitors, and OpenStack nodes.
  • December 2015 – Brocade conducts two training sessions at BU.
  • January 2016 – Red Hat conducts a weeklong Ceph training at BU.  Tyler B. and  H. Kim (Red Hat) are on-site to help with the Ceph installation.
  • February 2016 – An initial set of Intel Cache servers are configured, and an experimental radosgw-proxy service is set up.  This service will be tested on the cluster with Hadoop in the near future.
  • March 2016 – Deploying a handful physical systems for Hadoop Bare metal deployment POC.  Exploring the possibility of being part of the RH Ceph High Touch Beta program for early access/support for the upstream Infernalist release.
  • April 2016 – Hadoop Bare metal deployment against additional hardware loaners from Lenovo as well as against 200+ MIT MRI systems. Evaluate Infernalist before deployment for a possible deployment…
  • May 2016 –  Depending on the road map of the Big Data user cases, deploy Infernalist.
  • Summer 2016 – Big Data  and HPC use case support. Testing of BMI deployment.
  • Fall 2016 – Deploy Anycast setup for cache tiering experiments in the Brocade environment
  • Winter 2017 Upgrade to Mitaka
  • Spring 2017 Small staging area with 1 cache server and 3 compute nodes added to MOC rack.
  • Summer 2017 (Planned)
    • Deploy User Automation
    • Upgrade to Newton
    • Upgrade to Ocata (pending RHEL release)

 

Overview

The Bare Metal Imaging (BMI) is a core component of the Mass Open Cloud that (i) provisions numerous nodes as quickly as possible while preserving support for multitenancy using Hardware Isolation Layer (HIL) and (ii) introduces the image management techniques that are supported by virtual machines, with little to no impact on application performance.


Motivation

Imagine thousands of nodes in a data center that supports a multitenant bare metal cloud. We need a central image management service that can quickly image the nodes to meet the customer’s needs. Upon completion of a customer’s project, the data center administrator should ideally be able to reallocate the resources within few minutes to use them for another customer. As of now, these techniques are in use for Virtual Machines (VMs), but not for bare metal systems. This project aims to bridge this gap by creating a service that can address the above mentioned issues.

Bare metal systems that support Infrastructure as a Service (IaaS) solutions are gaining traction in the cloud. Some of the advantages include:

  • Best isolation with respect to containers or VMs.
  • Predictable/stable performance when compared to VMs or containers, especially on input/output (I/O) intensive workloads such as Hadoop jobs, which need predictable storage and network I/O.
  • Leveraging benefits of cloud services, such as economics of scale. As of now, VMs are scalable and elastic, as a customer pays for his/her usage based on resource consumption.
  • The baremetal nodes could be used for other IaaS services like Openstack or applications like HPC which can consume ideal CPU cycles.

The main concerns of a bare metal system are the inherent slowness in provisioning the nodes and the lack of features, such as cloning, snapshotting, etc.

This project proposes a system that includes all of the above advantages and also addresses the fast provisioning issue for a bare metal system. Using this system, we propose to provision and release hundreds of nodes as quickly as possible with little impact on application performance.


Current BMI (IMS) Architecture

The current design consists of a pluggable architecture that exposes an API for such features as:

  • provision – Provisions a physical node with given image
  • snapshot – snapshots the given image, so that we can use this as golden image
  • rm – Removes the image from project
  • list – Lists the images available in project
  • upload – Uploads the image to library

BMIS Architecture

We use Ceph as a storage back-end to save OS images. For every application we support, we have a “golden image,” which acts as a source of truth. When a user logs-in and requests a big data environment, we clone from this golden image and provision nodes using the cloned image and a PXE bootloader. Hardware Isolation Layer (HIL) serves as a network isolation tool through which we achieve multitenancy. HIL provides a service for node allocation and deallocation. For more details about HIL, please visit https://github.com/CCI-MOC/haas.


Project Team

Professor(s)

  • Prof. Orran Krieger, Boston University 
  • Prof. Gene Cooperman, Northeastern University 
  • Prof. Peter Desnoyers, Northeastern University 

Research Scientist(s), Postdoc(s) and Engineer(s)

  • Dr. Ata Turk, Boston University
  • Naved Ansari, MOC

Ph.D. Student(s)

  • Apoorve Mohan, Northeastern University

Undergraduate Student(s)

  • Ian Ballou, Boston University

Past Contributors

  • Dr. Jason Hennesey (Postdoc – now at NetApp Inc.)
  • Nasibeh Teimouri (Ph.D. Student)
  • Ugur Kaynar (Ph.D. Student) 
  • Sahil Tikale (Ph.D. Student) 
  • Ravi Santosh Gudimetla (Masters Student – now at Red Hat Inc.)
  • Sourabh Bollapragada (Masters Student – now at Arista Networks)
  • Daniel Finn (Masters Student – now at Wayfair)
  • Pranay Surana (Masters Student – now at Lighthouse AI) 
  • Sirushti Murugesan (Masters Student)

Progress June’16-March’17

  • Explore sensitivities of applications like OpenStack, HPC when using a network mounted system
  • Explore moving Operating systems from physical to virtual systems seamlessly.
  • Integrate support for attestation infrastructure for secure cloud.
  • A functional BMI on Engage1 cluster without modifications to existing infrastructure.
  • Ongoing support to Secure Cloud project.
  • Automated install setup for Redhat and other OS.
  • Performance evaluation for iSCSI server: IET vs TGT
  • Automated BMI install setup for ubuntu.
  • Multi-tenant iSCSI using TGT as backend.
  • Built a simple scheduler for dynamically moving the nodes across various clusters.
  • Performed experiments for improving the overall utilization of datacenter.
  • Built a proof of concept CI integration using Jenkins in Openshift.
  • Built OpenStack and HPC custom scripts for dynamically adding or removing nodes from a cluster.
  • Enhancements to code base e.g., Re-wrote exception, Database classes.
  • Poster accepted to SC’16 with initial findings on datacenter utilization improvements using BMI and HIL(close to 20% improvement with simple scheduler policies).

Timeline

  • Performance analysis of BMI vs Ironic and other provisioning systems. – In progress.
  • Continuous Integration using Redhat Openshift. – In progress
  • Publish BMI paper – In Progress
  • Making BMI pluggable – to work with different Network Isolators iSCSI Servers and Storage Backends.
  • Testing/Deploying BMI in production MRI Cluster
  • Exploring iSCSI-Multipathing for load balacing and fault tolerance
  • Exploring security issues for provding a publically available provisioning service using BMI
  • Improving the overall UX(User Experience) by having a single interface that can talk to HIL and BMI.
  • Enhancements to the code base e.g., Complete unit tests for iSCSI, HIL etc – PR’s on the way.
  • Exploring other features of true virtualization on physical nodes like suspend and resume operations.
  • Exploring other scheduler framework policies to further improve datacenter utilizations.

Planning and Getting Involved

To get involved in this project, please send email to (MOC team-list) and/or join the #moc irc channel on freenode.

 

 

Overview

The purpose of the Elastic High Performance Computing (HPC) project is to enable fast, flexible bare metal high performance computing resources on the Mass Open Cloud (MOC). This includes traditional HPC cluster environments, such as SLURM, as well as individual, customized deployments. As we will discuss below, it is ideal to stand up a high performance computing environment on any unutilized resources within the MOC.


Motivation

What is High Performance Computing?

It turns out that defining “HPC” is kind of like defining the word “car” — you probably know what a car is, but I bet you’d be hard pressed to write a concise, simple definition of one that means anything. — http://insidehpc.com/hpc-basic-training/what-is-hpc/

There are multiple use cases for HPC:

• Applications needing single, big nodes
• Applications needing a dedicated, strong network to connect nodes
• Applications needing hardware accelerators (eg. GPUs, FPGAs)
• Applications needing specialized deployments (eg. ATLAS)

What unifies these use cases is that they demand optimal performance in their requirements: running bare on the hardware and direct access to the network.

There has been a convergence of late between the requirements of an HPC environment and a cloud computing environment.

• HPC environments may have full bi-sectional bandwidth networks and may take advantage of accelerators: two features that Cloud environments are starting to have.
• Cloud environments traditionally have fault tolerance and higher level programming models built-in. While HPC hardware was once built explicitly to avoid faults, this is no longer feasible as we approach the exascale. HPC environments now have to build-in fault tolerance, and, as a consequence, have started to have higher level programming models.

There are still differences between an HPC environment and a Cloud environment. HPC users want every cycle and they fit their problems to available resources. Job schedulers are used to moderate resource usage, which traditionally leads to high utilization. On the other hand, the Cloud never wants to turn away users and always wants to meet peak demand. There should always be enough resources to meet the user’s need. This leads to low utilization of a traditional Cloud environment. When we consider these two use cases, it makes sense to have HPC be a part of the MOC because it utilizes otherwise idle resources.

As a first project, we are deploying a custom, flexible cluster based on OpenMPI, which we can stand up alongside the OpenStack production environment and Hadoop BigData jobs. This is sketched in the below image.

shared_hpc_sketchshared_hpc_sketchshared_hpc_sketch

This original application has branched into two coupled directions, both integrating traditional queueing mechanisms on HPC systems. These two directions correspond to expanding an HPC system onto both virtualized and physical elastic nodes.

Physical elastic nodes supports the traditional picture of high performance computing, which depends on direct access to the hardware. Given appropriate hardware, the elastic nodes can automatically take advantage of resources which enable and optimize tightly coupled applications. Examples of such applications are MPI jobs or multi-GPU jobs. Virtualized nodes are rapidly catching up to the efficiency of bare metal nodes, and for some applications are essentially equivalent. In this case, virtualization gives us additional benefits: the ability to checkpoint, suspend, migrate, and restore jobs.

To accomodate these two use cases, we have developed a proof-of-concept generic framework to provision elastic HPC nodes which can be integrated with a Slurm HPC cluster. To motivate a generic framework, we must first describe a traditional HPC cluster. These clusters are traditional set ups: in the case of Slurm, there is a single controller node which manages a job queue, and a static set of compute nodes which run executables per the Slurm controller’s management. The controller manages the static set of compute nodes with a single hard coded file. A generic framework needs to accomodate this picture of a “static” set of nodes.

The solution we have pursued is pre-loading the Slurm cluster with a set of nodes that may or may not be provisioned at any given time. This pre-loaded set of nodes can be uniquely tied to a machine by a hostname. As we elastically provision and deprovision nodes, we can attach them to a Slurm cluster simply by correctly bringing up a new Slurm compute node, assigning the correct hostname to the resource, and notifying the Slurm controller that the new node is available. This requires additional services for synchronization. All of these resources can live on a single additional VM. This configuration is sketched below:

cluster

The power of this general framework is it is applicable to both elastic physical and virtual nodes.

For static nodes, we can couple the ability to provision and deprovision physical nodes elastically with the partner Bare Metal Imaging project. Taking advantage of BMI, we can set up the environment for a Slurm cluster once. As needed, we can bring up bare metal images on any physical resource. Using our general framework, we can then attach these new nodes to an existing Slurm cluster. A further power of BMI is it takes advantage of the Hardware Isolation Layer, HIL, to avoid the issue of any other MOC project, such as the OpenStack deployment, inadvertently effecting the bare metal performance of other resources.

For virtual nodes, we can take advantage of the fact that virtualization enables checkpointing, suspending, migrating, and restoring nodes. As a first application, we will use our general framework to expand the bare-metal cluster on Engage1 with virtual compute nodes hosted on the Kaizen production environment. This enables the two-fold benefit of increasing utilization for our Kaizen environment, as well as increasing the resources available to HPC users on the Engage1 cluster.

As a first use case, we will support backfill-style HTC jobs from the Open Science Grid running on the Engage1 cluster. These jobs are typically small, single core jobs, which fit into low priority free cycles on Engage1. If higher priority jobs come along, the Open Science Grid jobs are terminated and requeued. With virtualized nodes, we can instead suspend OSG jobs and resume them when new resources become available. This is the first of many benefits of virtualized elastic HPC nodes.

Beyond these projects, we are also applying this expertise to help other groups utilize the MOC. One such group is the ATLAS collaboration. We can stand up their own production environment alongside a traditional HPC cluster, a Big Data Deployment, or our own Kaizen production environment.

Last, by expanding HPC resources on the MOC, we support MOC Monitoring. By expanding monitoring onto real production HPC systems, we provide a broader set of information that is useful both for performance and energy optimization, as well as anomaly detection.


Project Team

Core Project Team

  • Dr. Christopher N. Hill, Massachusetts Institute of Technology
  • Rajul Kumar, Northeastern University 
  • Sirine Benbrahim, Boston University 

Past Contributors

  • Dr. Evan Weinberg, Boston University 
  • Prof. Richard Brower, Boston University
  • Dr. Ata Turk, Boston University 
  • Laura Kamfonik, Boston University 
  • Ravisantosh Gudimetla, Northeastern University
  • Apoorve Mohan, Northeastern University
  • Sourabh Bollapragada, Northeastern University
  • Saul Youssef, Boston University/ATLAS
  • Mike Dugan, Boston University
  • Augustine Abaris, Boston University/ATLAS

Progress

• May 2017 : Presented at Boston OpenStack summit HPC/HTC and Cloud: Making Them Work Together Efficiently

• March 2017: Ran experiments on the staging environment to test the feasibility and outcome

• February 2017: Modified the Slurm scheduler to hold the state of the jobs on Slurm when a node/VM is suspended

• October 2016: Presented a poster Open Science Grid Slurm work at NENS 2016

• September 2016: Contributed an elastic Slurm environment to the BMI project

• May 2016: Deployed a static ATLAS Tier-2 node on the MOC

• March 2016: Demoed a SLURM+OpenMPI environment on Kaizen

 


Timeline

• January through May 2017: Setting up the modified Slurm scheduler on Engage1 to use virtual nodes on Kaizen
• January through May 2017: Publish general framework for Elastic Slurm and the value it brings to the system


Planning and Getting Involved

To get involved in this project, please send email to team@lists.massopen.cloud and/or join the #moc irc channel on freenode.

Monitoring Mass Open Cloud

Overview

The MOC Monitoring project aims to (i) present a scalable monitoring platform to collect and retain rich information about the Mass Open Cloud (MOC) and (ii) make that information available for analysis. On this page, we present our design for a scalable cloud monitoring system that can collect, mediate, and expose data coming from multiple layers of the cloud, provide our roadmap for the coming months, and share the immediate team of researchers/engineers working on the project.


Motivation

Cloud users today have little visibility into the performance characteristics, power consumption and utilization of cloud software and hardware components. The cloud has little visibility into user application performance requirements and critical metrics, such as response time and throughput. Through this project, we attempt to reduce this information gap.

Our fundamental goal is to provide a monitoring infrastructure that can expose rich information about all layers of the cloud (facility, network, hardware, OS, middleware, application and user layers) to all the other layers. In this way, we aim to reduce the need for costly reverse engineering or coarse-grained assumptions. We believe that such detailed, multi-layered information is key to developing intelligent, realistic performance and energy optimization techniques. For example, if the goal is to rapidly reduce data center power used while maintaining service level agreements (SLAs), we believe that it will be essential to identify candidate application components and their power/performance characteristics, and then determine the power saving technique to be applied to those components under SLA constraints.


Current MOC Monitoring Architecture

The current design of MOC’s monitoring architecture assumes a standard Infrastructure as a Service (IaaS) cloud setup that is composed of switches, storage and servers on the physical layer, and managed by OpenStack on the virtual layer.

MOCMON

The figure above shows the main components of the current MOC monitoring architecture. Our architecture is composed of four layers:

  1. Data collection layer: Monitoring information from different layers are collected in a scalable and low-overhead manner
  2. Data retention & consolidation layer: Collected monitoring information is persisted and then consolidated in the time-series database InfluxDB
  3. Services layer: Houses IaaS services, such as alerting and metering, as well as privacy preserving Application Programming Interface (API) services to expose monitoring data to unprivileged users
  4. Advanced monitoring applications layer: Where a wide variety of value-added services that utilize the monitoring data operate

Virtual layer utilization information is collected using Ceilometer; the syslogs of the individual compute servers and the logs of the OpenStack services are collected using LogStash; and the physical server resource utilization (e.g., CPU, memory, disk utilization), power metrics (e.g., consumed power, fan speeds, temperature sensors) and switch network utilization metrics (e.g., incoming/outgoing traffic on switch ports) are collected using Sensu.

Ceilometer and LogStash are coupled with scalable storage systems MongoDB and ElasticSearch, respectively. Since Sensu is an alerting system, normally data collected by Sensu is not stored in any database. We use the time-series database InfluxDB for persisting the information collected by Sensu. We also correlate and consolidate various types of data collected by Ceilometer and LogStash in InfluxDB. This consolidated database enables formulating complex queries, such as queries that expose the cross-layer state of the components of the datacenter they are utilizing to cloud users, or queries that enable cloud administrators to understand the impacts of changes made in the physical layer on virtual and application-layer performance.

Data collected from different layers can be used for performing various services of MOC. A metering service queries the MongoDB database populated by Ceilometer to create user cloud usage reports for given periods of time. The logging data collected by LogStash and indexed/served by ElasticSearch provides a keyword-based search interface with various filters to the MOC admins to be used for debugging. Sensu goes through a set of checks and creates alerts if any anomalies in systems are observed.

In addition to the above mentioned fundamental services, we are planning to have the MOC monitoring system provide two APIs that enable querying of the consolidated data in InfluxDB. The Security API will provide detailed networking information as well as interaction and packet meta-data information of users who opt-in to supply this data. The Monitoring API will simply expose all correlated performance, resource utilization and power data.

A unique feature of our monitoring system design is transparency to cloud users and administrators for a number of applications. For example, we are working on providing end-users with privacy preserving APIs that expose the state of the physical resources over which their VMs are hosted. This will enable users to achieve and automate stable and repeatable performance. We are also working on correlating and exposing virtual layer performance and physical layer power utilization data to cloud administrators. This can enable more intelligent resource scheduling in data centers as well as participation of data centers in energy market demand-response programs.


MOC Monitoring Sub Projects

Monitoring for power management in datacenters:

In this project, we try to use MOC monitoring data for: (1) Participation in emerging smart grid demand response programs in order to reduce datacenter energy costs and stabilize power grid demands, (2) budgeting available power to applications via peak shaving.

Today’s datacenters are capable of providing significant flexibility in power consumption owing to the advancements in dynamic power management techniques in servers as well as to the increasing diversity of cloud jobs ranging from latency-critical transactional jobs to delay-tolerant jobs. We believe that if we couple this flexibility with a systematic and accurate cloud power and performance monitoring infrastructure, cloud datacenters would be promising candidates for participating in emerging energy market demand response programs such as regulation service reserves (RSR) and peak power shaving. This participation not only can help stabilize the power grid and enable substantial electricity generation from renewables, but also could decrease rapidly growing energy costs of cloud datacenters due to the incentive credits offered by energy market operators for reserve provisioning.

A rich set of control knobs are available in a datacenter, including job scheduling or power capping of servers via voltage-frequency scaling and via changing resources allocated to each VM. In our initial analysis, we used the number of vCPUs allocated to a VM as an example, and also as a proxy to mimic a combination of various control knobs. Results of RSR provisioning and peak shaving studies we made on MOC using this control knob is presented in:

  • A. Turk, H. Chen, O. Tuncer, H. Li, Q. Li, O. Krieger, A. K. Coskun, “Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud”, in USENIX CoolDC’2016.
    https://www.usenix.org/system/files/conference/cooldc16/cooldc16-paper-turk.pdf

Security analytics to detect cloud attacks

Public IaaS clouds are subject to many security threats including data breaches, account compromises, denial-of-service attacks and abuse of cloud services. Traditional security controls such as encryption, data integrity, replication, and two-factor authentication are necessary to raise the cloud security posture, but not sufficient to mitigate against all these threats.

We propose an analytics-based security service (called MOSAIC) implemented on top of the monitoring platform that will use a variety of machine learning algorithms to profile the legitimate activity of cloud users and applications, and detect anomalous activities related to a wide-range of attacks. Our system aims to provide early warning to both the cloud provider and users of the cloud about possible threats experienced by the cloud infrastructure, and will complement existing security defenses.

We also plan to investigate possible mitigation techniques once suspicious activities are detected, for example migrating suspicious VMs in an isolated part of the cloud and performing more detailed monitoring to determine the root cause of the observed behavior.


Project Team

Core Project Team

• Prof. Ayse Kivilcim Coskun (Boston University) 
• Prof. Alina Oprea (Northeastern University)
• Ata Turk, PhD (Boston University) 
• Raja Sambasivan (Boston University) 
• Ruoyu Chen
• Jeremy Freudberg
• Nagasai Vinaykumar Kapalavai
• Rohit Kumar 
• Lily Sturmann  
• Trishita Tiwari
Contributors

• Susanne Balle (Intel)
• Julia A. Santos (Intel)
• Hua Li (Wayfair.com)
• Qingqing Li (Boston University)
• Hao Chen (Boston University)
• Dimitri Makrigiorgos 
• Gen Ohta 
• Ugur Kaynar 
• Ozan Tuncer 


Progress

March 2016: Test of monitoring stack in Staging, Production deployment of Sensu+InfluxDB+Graphana
April 2016: Visualization of MOC hardware utilization in Horizon; Evaluation of Monasca as alternative monitoring solution, decision to not proceed with Monasca for now.
May–August 2016: Incremental backups of InfluxDB, formatted emails, calibrated thresholds for alerting system, complete testing of Ceilometer & ElkStack in staging.
June–August 2016: Integration of datacenter layer large component (e.g. IRCs, water pumps, chilling towers, …) power usage into the monitoring system.
September–November 2016: Exposing monitoring data to trusted researchers.
September–December 2016: MOC monitoring show back system (virtualized resources per project)
November–December 2016: MOC monitoring show back system (bare-metal resources)
December 2016– March 2017: Collect network flow data from MOC switches to get detailed information on both internal and external network activity of cloud VMs.
• January 2017: Create monitoring working group
• February 2017 — Present: Re-evaluate Monasca due to performance issues w/Ceilometer.
• March 2017– April 2017: Evaluate different end-to-end tracing systems for integration into the monitoring infrastructure.


Planning and Getting Involved

To get involved in this project, please send email to (mail:team@lists.massopen.cloud) and/or join the #moc irc channel on free node.

 

Overview

MOC is part of the Mass Big Data Initiative and at its core is a BigData project. Leveling the field for BigData Platform (BDP) innovation by enabling research and innovation in a production environment is among MOC’s main goals. More specifically, its aims are (i) to enable BigData Platform (BDP) research in the Commonwealth region (ii) enabling users & government to easily upload and share public datasets (iii) exploiting innovative infrastructures such as SSDs, low-latency networking, RDMA, accelerators etc… in BDP research.

Currently we are focused on multiple studies that will establish the foundation for the above goals, such as building a public dataset repository, building an on-demand bare-metal BigData provisioning system, building an efficient storage and datacenter-scale caching mechanism that utilize SSDs and enable fast processing of public datasets in BigData platforms provisioned over MOC, providing container-based cloud services using OpenShift, and exposing innovative infrastructures such as GPUs and FPGAs to cloud users as a service.


Sub-Projects

On-Demand Bare-Metal BigData @ MOC:
Increasingly, there is a demand for bare-metal bigdata solutions for applications that cannot tolerate the unpredictability and performance degradation of virtualized systems. Existing bare-metal solutions can introduce delays of 10s of minutes to provision a cluster by installing operating systems and applications on the local disks of servers. This has motivated recent research developing sophisticated mechanisms to optimize this installation. These approaches assume that using network mounted boot disks incur unacceptable run-time overhead. Our analysis suggest that while this assumption is true for application data, it is incorrect for operating systems and applications, and network mounting the boot disk and applications result in negligible run-time impact while leading to faster provisioning time. We developed a network mounted bigdata provisioning system prototype and working on turning our solution to an on-demand bare-metal bigdata provisioning system.

For more details check:

Ata Turk, Ravi S. Gudimetla, Emine Ugur Kaynar, Jason Hennessey, Sahil Tikale, Peter Desnoyers, and Orran Krieger. An experiment on bare-metal bigdata provisioning. In 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 16), Denver, CO, 2016. USENIX Association.

MOC Dataset Repository (Cloud Dataverse):
In collaboration with Harvard Dataverse team, we are building a dataset repository similar to AWS Public datasets, that will initially host public datasets of the Commonwealth of Massachusetts and then will expand to hosting public and community datasets of researchers in the Massachusetts region. The core project that we work on to provide a dataset repository with added computational abilities is called the Cloud Dataverse project. MOC Dataset Repository (MOC-DR) is the first implementation of Cloud Dataverse.

MOC-DR will host a variety of data sets. Large data sets from the Commonwealth of Massachusetts, academy and industry ( e.g. daily Massachusetts traffic congestion and public transportation information, CERN physical experimentation datasets, human genome datasets and stock market exchange transaction archives) that normally require hours or days to locate, download, customize, and analyze will be available for access. Dataset metadata and associated provenance information will be hosted on MOC-DR, a Dataverse portal [1].

Usage scenarios of the dataset repository include:

(1) Users will be able to upload data sets into MOC-DR: (a) A user will login to MOC-DR, (b) will be authenticated against the MOC authentication system (e.g. Keystone), (c) will be presented an interface for uploading datasets and providing the metadata information (current Dataverse data upload interface). (d) Datasets will be stored in MOC Object Storage backend (CEPH), with which Dataverse communicates via Swift calls.

(2) Users will be able to download Datasets from MOC-DR: (a) A user will log into MOC-DR (using Keystone credentials), (b) browse through public datasets, (c) select a dataset to work on, (d) obtain the Swift Object Store endpoints for the selected dataset, (f) from here on out a user can either use the given endpoints to download the datasets or use MOC Big Data Processing System to analyze the data on MOC without needing to worry about the cost of storing the data or the time required to download it.

MOC Big Data as a Service (BDaaS) on Engage1:

In collaboration with MIT, Northeastern University, Intel, TwoSigma  and RedHat CEPH team), we are building an on demand Big Data processing system that is closely tied with the MOC public dataset repository. The service will run on the servers of the Engage1 cluster.

One envisioned standard use case for this system is as follows:

(1) User selects a dataset to work on the MOC Big Data Processing environment: After selecting a dataset to work on, (a) users will be directed to MOC On-Demand BigData processing UI, (b) will be able to provision a BigData environment to work on the selected dataset (by setting environment configuration parameters, e,g, number of nodes, storage, processing, memory capacity, the set of Big Data processing applications (e.g. Hadoop, Pig, Spark, …) to have, etc…), (c) using the users credentials and her desired configurations, “MOC Imaging System” will create a Big Data processing cluster for the user, and (d) user will be able to use Hadoop-ecosystem applications to process and analyze the dataset she selected using provided resources.

We envision to provide a number of options for the Big Data processing environment: (A) The Big Data processing system can be:

(I) a bare metal environment provisioned via MOC Bare Metal Imaging (BMI) system: (II) a virtualized environment provisioned via OpenStack Sahara, or (III) a containerized environment (possibly provisioned via Bluedata’s container based solutions).

(B) The dataset (or the subset of the dataset, e.g. imagery data from the last week) selected by the user can be prefetched into the caching servers in the processing cluster. This prefetching will prevent multiple small object requests to the storage backend and will guard against variance that may be caused by input fetching during computation.

Example dataset types: We plan to host a number of datasets. A representative dataset type we plan to host is large read only datasets that require an extraction and transformation phase to enable further analysis such as the CMS Primary Datasets [2], MBTA Ridership, Alerts & Service Quality Datasets [3], or The 3,000 Rice Genomes Project [4]. These types of datasets are by nature well divided into separately operable chunks, they get incremental updates (e.g. daily, weekly, monthly,…), the latest datasets are considered to be more ‘hot’, and most processing tasks can be done in a single pass over a number of incremental units of data (e.g. compute/compare daily aggregates for the last 10 days).

[1] http://dataverse.org 

[2] http://opendata.cern.ch/collection/CMS-Primary-Datasets

[3] http://www.mbta.com/about_the_mbta/document_library/?search=blue+book&submit_document_search=Search+Library

[4] http://gigascience.biomedcentral.com/articles/10.1186/2047-217X-3-7

Existing studies indicate that in the HDFS, a few files account for a very high number of accesses and there is a skew in access frequency across HDFS data. Thus, we believe that more sophisticated caching architecture and policies that includes the frequently accessed files will bring considerable benefit to big data analytics.

We are working on improving the performance of Big Data processing Frameworks (e.g. Hadoop, Spark…etc) in MOC BDaaS by implementing a multi-level cache architecture called “D3N”. More details can be found here.

 


Project Team

Core Project Team

  • Prof. Orran Krieger (Boston University) 
  • Prof. Peter Desnoyers (Northeastern University) 
  • Ata Turk, PhD (Boston University) 
  • E. Ugur Kaynar (Boston University) 
  • Ravisantosh Gudimetla (Red Hat)
  • Sahil Tikale Nikhil (Boston University) 
  • Mania Abdi (Northeastern University)
  • Mohammad Hossein Hajkazemi (Northeastern University)

Collaborators

  • David Cohen (Intel)
  • Gary Berger (Brocade)
  • William Nelson (Lenovo)
  • Dataverse Team (Harvard)
  • Mercè Crosas (Harvard)
  • Leonid Andreev (Harvard)
  • Zhidong Yu (Intel)
  • Jingyi Zhang (Intel)
  • Piyanai Saowarattitada (MOC)
  • Tom Nadeau (Brocade)
  • Mark Presti (Brocade)
  • Robert Montgomery (Brocade)
  • Pete Moyer (Brocade)

Timeline

  • April–August 2017: 2-tiered caching: upstreaming developed code (CEPH).
  • August–November 2017: Caching write-back support.
  • August–November 2017: Containers as an MOC service: production OpenShift setup.
  • November– 2018: Production MOC BigData as a Service solution.

Planning and Getting Involved

 To get involved in this project, please send email to (mail:team@lists.massopen.cloud) and/or join the #moc irc channel on free node.

 

Motivation

Hardware Isolation Layer (HIL, formerly known as HaaS)) is an open-source bare metal isolation service that automates allocation and management of isolated pools of non-virtualized compute resources to users. By following a minimal approach, only performing network isolation, HIL allows mutually untrusting and incompatible bare metal services to be deployed in a data center. Resources can then be shifted between pools as supply and demand dictate, increasing utilization and efficiency.  For example, separate pools, allocated out of HIL, are used for the production MOC OpenStack cloud, for staging areas, for various HPC clusters and for research experiments such as EbbRT.

HIL is being developed as part of the MOC. HIL allows us to securely develop the MOC in the MGHPCC alongside the many production services deployed in this data center, and incrementally move resources into the MOC as demand warrants. It is also critical to the marketplace model that is central to the MOC; by decoupling the operators of hardware and services, HIL will enable a consumption-based marketplace for hardware.


Architecture/Development

The Architecture of HIL, while still being realized, is shown below.

hil_archThe HIL Architecture is implemented by components linked via REST APIs to form a micro-kernel for the data-center. These components (some of which are still under development) can be categorized as:

  • core HIL components: those parts strictly necessary to realize the HIL architecture
  • system drivers: pluggable implementations of a standard interface to a specific external system (e.g. network switch control)
  • optional services: HIL-related services which can be overridden on a per-allocation basis

Project Team

Core Project Team

  • Naved Ansari, Mass Open Cloud at Boston University
  • Ian Denhardt, Mass Open Cloud
  • Peter Desnoyers, Northeastern University 
  • Jason Hennessey, Boston University
  • Kyle Hogan, Boston University
  • Shuwen (Jethro) Sun
  • Sahil Tikale, Boston University

Contributors

  • Ryan Abouzahra, former USAF
  • Jonathan Bell, Boston University
  • Logan Bernard, Boston University
  • Rohan Garg, Northeastern University
  • Orran Krieger, Boston University
  • Zhaoliang Liu, former Northeastern University
  • Nick Matsuura, USAF
  • Apoorve Mohan, Northeastern University 
  • Andrew Mohn, former Boston University 
  • Kristi Nikola, Boston University 
  • George Silvis, Boston University 
  • Ron Unrau
  • Valerie Young, former Boston University
  • Lucas(Hang) Xu, Boston Univeristy 
  • George Silvis III
  • Ravisantosh Gudimetla
  • Jonathan Bernard
  • Abhishek Raju
  • Zespre Schmidt (National Chiao Tong University, Taiwan)
  • Ritesh Singh

Team Lead

  • Naved Ansari, Boston University [Email]

Progress through June 2016

  • Brocade support for Engage1 cluster
  • Implementing a driver mechanism for Out of Band Management systems, like IPMI. This is prep work for supporting others (like Dell iDRAC) as well as recursive HIL
  • ATLAS team ran a successful prototype on top of HIL
  • basic auth
  • basic logging
  • Additional query functions to support easier use of HIL
  • Prototyped Ironic and MaaS managing nodes under HIL
  • Created Continuous Integration tests to improve code quality by testing every Pull Request on github before it is reviewed/integrated

Progress July – September 2016

  • Network(HIL abstraction) access control supported
  • Keystone Authentication integrated
  • Additional query parameters/functions to support easier use of HIL
  • Prototyped Ironic and MaaS managing nodes under HIL
  • Documentation upgrade – developer guidelines to help new developers come on board
  • Incorporation changes to support production level usage
  • Brocade SDN switches supported for Engage1 cluster
  • Implementing driver mechanism for Out of Band Management systems, like IPMI. This is prep work for supporting others (like Dell iDRAC) as well as recursive HIL
  • ATLAS team ran a successful prototype on top of HIL
  • Logging mechanism
  • Created Continuous Integration tests to improve code quality by testing every Pull Request on github before it is reviewed/integrated

Progress September 2016 – April 2017

  • A paper presented at ACM SoCC! HIL: Designing an Exokernel for the Data Center
  • Merged 47 pull requests
  • Began weekly hacking sessions with the Secure Cloud and BMI teams in order to spurn collaboration and Get Stuff Done™.
  • Several improvements for stability/robustness, including bug fixes and improved: test coverage, documentation, input validation and systemd integration.
  • Creating attractive documentation to ease new contributors via ReadTheDocs.io
  • Introduced fine-grained Network ACLs, which enables unprivileged providers to offer network services to HIL users. For example, Bare Metal Imaging.
  • A new key/value store for per-node metadata. Initially, this will be used to support Secure Cloud deployments, specifically documenting whitelisted/known-good measurements for firmware.
  • An API for setting the next boot device
  • Successfully deployed onto portions of the Engaging1 cluster.

 Upcoming

Some features planned for the upcoming months include:

  • Improve network drivers: using SNMP may give compatibility to a wide range of switches.
  • Improved CLI with tab completion
  • Examining inclusion of a driver for Intel OmniPath
  • Complete Ironic/MaaS prototype, propose upstream changes
  • A reset_port function, which will document in code initial port config and enable maintenance like replacing switches or undoing manual adjustments.
  • Propose HIL as an Open Stack project
  • VPN replacement for Headnode functionality: let’s users run headnodes from anywhere instead of relying on HIL’s built-in
  • Client library to give applications a python API to use to write programs against HIL
  • Async API allows scripts to check status on running operations
  • Putting leasing script into production

 


Papers

  • J. Hennessey, C. Hill, I. Denhardt, V. Venugopal, G. Silvis, O. Krieger, and P. Desnoyers, “Hardware as a service – enabling dynamic, user-level bare metal provisioning of pools of data center resources.,” in 2014 IEEE High Performance Extreme Computing Conference, Waltham, MA, USA, 2014. [Open BU, PDF].
  • J. Hennessey, S. Tikale, A. Turk, E. U. Kaynar, C. Hill, P. Desnoyers, and O. Krieger. 2016. HIL: Designing an Exokernel for the Data Center. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC ’16). DOI: 10.1145/2987550.2987588

Planning and Getting Involved

To get involved in this project, you can:

  1. Send email to hil-dev-list@bu.edu
  2. Join the #moc irc channel on freenode
  3. Start reading our documentation on github, and contribute either code (via pull requests) or issues