Hardware Isolation Layer (HIL, formerly HaaS)

 

Motivation

Hardware Isolation Layer (HIL, formerly known as HaaS)) is an open-source bare metal isolation service that automates allocation and management of isolated pools of non-virtualized compute resources to users. By following a minimal approach, only performing network isolation, HIL allows mutually untrusting and incompatible bare metal services to be deployed in a data center. Resources can then be shifted between pools as supply and demand dictate, increasing utilization and efficiency.  For example, separate pools, allocated out of HIL, are used for the production MOC OpenStack cloud, for staging areas, for various HPC clusters and for research experiments such as EbbRT.

HIL is being developed as part of the MOC. HIL allows us to securely develop the MOC in the MGHPCC alongside the many production services deployed in this data center, and incrementally move resources into the MOC as demand warrants. It is also critical to the marketplace model that is central to the MOC; by decoupling the operators of hardware and services, HIL will enable a consumption-based marketplace for hardware.


Architecture/Development

The Architecture of HIL, while still being realized, is shown below.

hil_archThe HIL Architecture is implemented by components linked via REST APIs to form a micro-kernel for the data-center. These components (some of which are still under development) can be categorized as:

  • core HIL components: those parts strictly necessary to realize the HIL architecture
  • system drivers: pluggable implementations of a standard interface to a specific external system (e.g. network switch control)
  • optional services: HIL-related services which can be overridden on a per-allocation basis

Project Team

Core Project Team

  • Naved Ansari, Mass Open Cloud at Boston University
  • Ian Denhardt, Mass Open Cloud
  • Peter Desnoyers, Northeastern University 
  • Jason Hennessey, Boston University
  • Kyle Hogan, Boston University
  • Shuwen (Jethro) Sun
  • Sahil Tikale, Boston University

Contributors

  • Ryan Abouzahra, former USAF
  • Jonathan Bell, Boston University
  • Logan Bernard, Boston University
  • Rohan Garg, Northeastern University
  • Orran Krieger, Boston University
  • Zhaoliang Liu, former Northeastern University
  • Nick Matsuura, USAF
  • Apoorve Mohan, Northeastern University 
  • Andrew Mohn, former Boston University 
  • Kristi Nikola, Boston University 
  • George Silvis, Boston University 
  • Ron Unrau
  • Valerie Young, former Boston University
  • Hang Xu, Boston Univeristy 
  • George Silvis III
  • Ravisantosh Gudimetla
  • Jonathan Bernard
  • Abhishek Raju
  • Zespre Schmidt (National Chiao Tong University, Taiwan)
  • Ritesh Singh

Team Lead

  • Jason Hennessey, Boston University [Email]

Progress through June 2016

  • Brocade support for Engage1 cluster
  • Implementing a driver mechanism for Out of Band Management systems, like IPMI. This is prep work for supporting others (like Dell iDRAC) as well as recursive HIL
  • ATLAS team ran a successful prototype on top of HIL
  • basic auth
  • basic logging
  • Additional query functions to support easier use of HIL
  • Prototyped Ironic and MaaS managing nodes under HIL
  • Created Continuous Integration tests to improve code quality by testing every Pull Request on github before it is reviewed/integrated

Progress July – September 2016

  • Network(HIL abstraction) access control supported
  • Keystone Authentication integrated
  • Additional query parameters/functions to support easier use of HIL
  • Prototyped Ironic and MaaS managing nodes under HIL
  • Documentation upgrade – developer guidelines to help new developers come on board
  • Incorporation changes to support production level usage
  • Brocade SDN switches supported for Engage1 cluster
  • Implementing driver mechanism for Out of Band Management systems, like IPMI. This is prep work for supporting others (like Dell iDRAC) as well as recursive HIL
  • ATLAS team ran a successful prototype on top of HIL
  • Logging mechanism
  • Created Continuous Integration tests to improve code quality by testing every Pull Request on github before it is reviewed/integrated

Progress September 2016 – April 2017

  • A paper presented at ACM SoCC! HIL: Designing an Exokernel for the Data Center
  • Merged 47 pull requests
  • Began weekly hacking sessions with the Secure Cloud and BMI teams in order to spurn collaboration and Get Stuff Done™.
  • Several improvements for stability/robustness, including bug fixes and improved: test coverage, documentation, input validation and systemd integration.
  • Creating attractive documentation to ease new contributors via ReadTheDocs.io
  • Introduced fine-grained Network ACLs, which enables unprivileged providers to offer network services to HIL users. For example, Bare Metal Imaging.
  • A new key/value store for per-node metadata. Initially, this will be used to support Secure Cloud deployments, specifically documenting whitelisted/known-good measurements for firmware.
  • An API for setting the next boot device
  • Successfully deployed onto portions of the Engaging1 cluster.

 Upcoming

Some features planned for the upcoming months include:

  • Improve network drivers: using SNMP may give compatibility to a wide range of switches.
  • Improved CLI with tab completion
  • Examining inclusion of a driver for Intel OmniPath
  • Complete Ironic/MaaS prototype, propose upstream changes
  • A reset_port function, which will document in code initial port config and enable maintenance like replacing switches or undoing manual adjustments.
  • Propose HIL as an Open Stack project
  • VPN replacement for Headnode functionality: let’s users run headnodes from anywhere instead of relying on HIL’s built-in
  • Client library to give applications a python API to use to write programs against HIL
  • Async API allows scripts to check status on running operations
  • Putting leasing script into production

 


Papers

  • J. Hennessey, C. Hill, I. Denhardt, V. Venugopal, G. Silvis, O. Krieger, and P. Desnoyers, “Hardware as a service – enabling dynamic, user-level bare metal provisioning of pools of data center resources.,” in 2014 IEEE High Performance Extreme Computing Conference, Waltham, MA, USA, 2014. [Open BU, PDF].
  • J. Hennessey, S. Tikale, A. Turk, E. U. Kaynar, C. Hill, P. Desnoyers, and O. Krieger. 2016. HIL: Designing an Exokernel for the Data Center. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC ’16). DOI: 10.1145/2987550.2987588

Planning and Getting Involved

To get involved in this project, you can:

  1. Send email to hil-dev-list@bu.edu
  2. Join the #moc irc channel on freenode
  3. Start reading our documentation on github, and contribute either code (via pull requests) or issues