Schedule (subject to change):
Livestreamed Here: https://www.youtube.com/@MassOpenCloudAlliance
Monday, March 20
8 am – 9 am: Check-in, Breakfast & Networking
9 am – 10:30 am: Welcome and MOC-A Updates
Introductory Remarks – Gloria Waters, Boston University
Workshop Overview – Orran Krieger, Boston University
10:30 am – 11 am: Break & Networking
Sharron Wall will share insights about MassTech’s Innovation Institute, and their critical role supporting the growth of priority industry clusters within the technology and innovation sectors of the economy in Massachusetts. Working at the intersection of State, Industry University, Nonprofit and other key stakeholder communities, facilitating connections between research institutions and leading technology firms through strategic investments in R&D infrastructure in the development of Ecosystem and Industry Cluster development.
MassTech’s Research and Development Matching Grant fund investments in the Massachusetts Green High Performance Computing Center (MGHPCC) and the Massachusetts Open Cloud Initiative (MOC), demonstrate key investments in both physical and intellectual capital by the state to support growth in our technology and innovation economy and as a catalyst for economic development initiatives across the state.
11:35 am – 1:15 pm: Microtalks I
OPE leverages modern open source technologies to create an open environment and platform in which educators can create, publish, and operationalize high-quality open source materials, while students require no more than access to a web browser to access them. To achieve this, we have built an open ownership model that starts with high-performance, open data centers providing the hardware resources. This model allows us to exploit Linux and build a rich environment of tools and services to support a novel approach to educational material.
In this talk, we will explore how Professor Jonathan Appavoo has utilized the OPE project to enhance the teaching and learning experience in his class. Furthermore, we will delve into the development work that underpins the project, including the engineering goals, contributions of interns at Red Hat, and milestones that have been reached so far. Lastly, we will outline our plans for the future to improve the platform and expand its capabilities. OPE welcomes contributions from anyone interested in improving education through open source technology.
ESI (Elastic Secure Infrastructure) has made great strides in enabling a multi-tenant bare metal cloud. Learn about the code we’ve developed, the capabilities we provide, and the MOC ESI installation that’s up and running right now!
Rudolph and Mo will provide an update on the progress the ChRIS project has made in the past year, provide a brief demo of a new ChRIS-based tool that exemplifies how ChRIS can power tools that integrate with existing radiology workflows, and discuss a couple of projects on our roadmap – one for enabling ChRIS on the edge, one for enabling secure compute within the MOC.
Unikernel Linux (UKL): Applying Unikernel Optimizations to a General Purpose OS) – Larry Woodman, Red Hat & Eric Munson, Boston University
Specialized hardware has enabled us to fundamentally change the way we compute, communicate and store data. However, the challenge with tapping into this new computing paradigm is that developer effort can increase. Developers may need to significantly change their code for every combination of underlying hardware, target workload, system stack, and overhead budget (runtime, energy usage etc). This results in large timeframes for creating and maintaining code, and hence is impractical for a majority of use cases.
One possible approach to reducing this developer effort is co-design. Co-design allows us to create a development cycle that is similar in efficiency to that of software development. It focuses on building specialized hardware blocks and frameworks which have the capabilities and interfaces needed for software integration. Simultaneously, it focuses on developing software that is also designed to improve workload efficiency, and contains appropriate out of the box mechanisms for taking advantage of available specialized hardware. However, while co-design research is important for advancing specialized hardware, we are currently constrained in the extent to which this can be done in current production and research environments. This is because the infrastructure is shared and thus access to hardware is restricted. As a result, there is a deadlock – the current support for specialized hardware is not advanced enough to support co-design research, and there is not sufficient co-design research possible to advance available support for specialized hardware to address restrictions.
CoDes research lab aims to break this deadlock through an on-premise incubation step. Located at Boston University as part of the Red Hat – BU collaboratory, CoDes provides the infrastructure and engineering foundation needed to support co-design based specialized hardware research. It substantially widens the pool of possible projects by minimizing the risk of this research, and by providing researchers with more control over the software and hardware stacks. Once projects reach a certain level of completeness and reliability, they can be transitioned into larger infrastructures and the research can continue in the shared environment. Moreover, the centralization and availability of project data in the CoDes ecosystem, coupled with the above engineering foundation, can i) reduce the overhead of research, ii) validate completed efforts, iii) help identify areas of further improvement/innovation, iv) lower the bar for entry in specialized hardware research, and v) open up new exciting avenues of research.
Exploring LSVD and D4N, Directory based D3N
This work presents BayOp, a generic controller to optimize the efficiency of network applications by taming and controlling its performance and energy trade-offs in an automatic way. Given an objective, the controller adjusts hardware mechanisms that control interrupt coalescing and processor energy settings. We demonstrate that BayOp} can search through a space of over 2 million possible configurations to yield settings that result in energy savings of over 50% for two different OSes.
We establish the potential to coordinate interrupt coalescing and processor energy settings through an extensive study of closed and open loop network applications across both OSes. By sweeping a broad range of configurations, we find energy savings of over 80\% is possible. Furthermore, our study reveals characteristic profiles of software stack behavior that is stable and similar across OSes such that it can generalize into a well-formed control problem.
Using the results of the study, we evaluate the accuracy and generality of a machine learning technique, Bayesian optimization, to select these settings. Based on these findings we develop the BayOp controller, which can dynamically adjust the settings of a server to adapt to changing offered loads and performance and energy goals while meeting different service-level agreement objectives.
1:15 pm – 2:45 pm: Lunch & Poster Session
2:45 pm – 3:20 pm: The Value of Hybrid Cloud with Red Hat – Blake Shiver, Red Hat
Blake currently leads the global Cloud Partners organization for Red Hat, Inc, with a mission to continue to drive Red Hat’s ambitious hybrid cloud strategy. Blake most recently served as Chief of Staff for the President and CEO of Red Hat. Prior to that, Blake led Sales Strategy and Planning for the global Sales and Customer Success organization, assisting with the design and operationalization of our global GTM transformation.
Throughout his career at Red Hat, Blake has been a key partner to the sales leadership team, supporting Red Hat’s strategic presence with customers and ecosystem partners, as well as internally leading strategic programs that range from technology acquisitions, designing and implementing the business operations to support the Red Hat and IBM merger, and the build out of our multi-product sales growth program that dates back nearly a decade. Blake has a passion for technology, business, and strategy and he is energized by opportunities to innovate through open source collaboration models.
3:20 pm – 3:45 pm: “Collaboration with the MOC Alliance” – Jon Stumpf, Two Sigma
IT infrastructure is common across many industries. Many elements of these infrastructures are not business-differentiating which creates the opportunity for collaboration around technologies, techniques, practice and policy. We have enjoyed and benefited from simple conversations about our challenges and how MOC-A research is pushing the boundaries in our domains of interest. I will be sharing an overview of some of our collaborations.
3:45 pm – 4:15 pm: Break & Networking
4:15 pm – 5:15 pm: Microtalks II, Diversity of the MOC-A – Moderated by Will Tomlinson, Software & Application Innovation Lab (SAIL)
Hosted by the Software & Application Innovation Lab (SAIL), the Hariri Institute’s own professionally staffed software development entity, this session highlights the diversity in software products, for domain research, that is deployed on the New England Research Cloud (NERC). The session will showcase the unique collaboration between, SAIL, the MOC-A and university research, as Principal Investigators discuss their, SAIL-developed, research software applications and its broader impacts to society. Specifically, Boston University faculty members, David Boas (Professor, Biomedical Engineering, College of Engineering), Naomi Caselli (Assistant Professor, Deaf Studies, Wheelock College of Education & Human Development), Douglass Densmore (Professor, Electrical & Computer Engineering, College of Engineering) and Hank Fien (Professor, Teaching & Learning, Wheelock College of Education & Human Development), will provide session attendees with an understanding of their research objectives and how the resources of SAIL and the MOC-A have been a component towards its advancement.
David Boas, BIDS/BIDS-fNIRS
A tool that allows users from the fNIRS imaging community to easily create, manage, download, and share fNIRS datasets using a web interface. These datasets are hierarchically organized, composed of Subjects, (optionally) Sessions, Runs, and finally SNIRF files and BIDS text files, as specified by the SNIRF file formatand the BIDS/BIDS-fNIRS formats. The interface provides readable, navigable views on the data and workflows for editing BIDS text files.
Naomi Caselli, ASL-LEX
ASL-LEX, a database of lexical and phonological properties of American Sign Language signs. This platform allows different researchers the ability to tag short videos with various information. The most common use case is a researcher starting a dedicated instance of SignLab, uploading a series of videos with meta data for tagging, adding users to SignLab to assist in tagging, then having the option of exporting the tagged data, for further processing, to facilitate collaborative and/or independent research.
Douglas Densmore, DAMP Lab
Web-based tool for microbiology and clinical services that would allow researchers, who currently submit experimental procedures to the DAMP Lab for Biological research purposes via email, to create and visualize their workflows, have access to information on the services they are choosing, the associated parameters of these services, and enable the submission of their job through an interactive user-interface.
Hank Fien, NCIL
A domain-specific data repository that will serve as a knowledgebase, and conduit for data contribution, from researchers and educators. Functionally it will store, process, analyze and redistribute educational (i.e., screening, identification, intervention) data towards enhancing reading achievement levels in children grade K-3. The repository will be domain-specific in that it will be limited to educational intervention data only. However, the data itself will range from educational, cognitive, neuroimaging, qualitative, and other.
Milson Munakami, Harvard University
5:15 pm – 5:30 pm: Closing Remarks – Michael Daitzman, MOC Alliance
Tuesday, March 21
8 am – 9 am: Breakfast & Networking
9 am – 9:15 am: Welcome Remarks
9:15 am – 10 am: Center Vision and Capabilities
Orran Krieger, Boston University & Peter Desnoyers, Northeastern University
10 am – 10:30 am: Industry Support Keynotes
10:30 am – 10:45 am: Break and Networking
10:45 am – 11:45 am: IUCRC Presentation
Mohan Kumar, National Science Foundation
11:45 am – 1:15 pm: Lunch & Poster Session
1:15 pm – 3 pm: Dual Track
MOC Alliance Deep Dives
The Open Cloud Testbed (OCT) is an infrastructure project supported by NSF. It integrates FPGAs into nodes accessible through the CloudLab framework. CloudLab provides bare-metal access to the servers in the testbed, a feature that is not available in other commercial and private production clouds. The integration of network-attached FPGAs in OCT opens up a wide range of research avenues for scientists and engineers. This tutorial will walk through the build and deployment process for FPGAs in OCT, explaining how researchers can access the necessary development tools to build bitstreams, and deploy them on FPGA hardware in OCT. A brief overview of network-attached FPGAs and their applications will also be provided. By the end of the tutorial, participants will have a thorough understanding of the FPGA build and deployment process in OCT, and will be able to apply this knowledge to their own research projects.
IUCRC Research Proposals
1:15 pm Split Processes: Bringing the Process to the Data Cooperman (NU), Desnoyers (NU), Krieger (BU)
The recently developed technology of split processes enables transparent checkpointing of processes over arbitrary network APIs, by checkpointing application software while isolating it from the underlying network libraries, in turn allowing process migration by checkpoint/resume.
Our current experiments use this to enable efficient runtime adaption of scientific simulations, where the simulated 2 or 3-dimensional grid must be adapted to the non-grid (e.g. toroid) hardware interconnect.
Future work will apply this technique in container and data analytics environments to migrate processing closer to data.
1:35 pm Storage as a network: extending distributed tracing to S3 Sambasivan (Tufts), Desnoyers (NU), Krieger (BU)
Prior work by the PIs [HotStorage19, FAST21] has shown that information on future data access derived from scheduling information made available by data analysis frameworks such as Spark may be used to make informed prefetching and cache eviction decisions, yielding significant speedups even with small cache sizes which produce no measurable improvement when used with LRU replacement. Yet many predictable access patterns which could be exploited to improve performance are not captured in such an easy-to-use form, being embedded in arbitrary programs, scripts, or even business processes.
We propose to record these patterns using distributed tracing mechanisms, allowing future accesses to be inferred and used for predictive decisions. In particular we will modify the RGW S3 service to (a) convey OpenTelemetry identifiers and (b) store them as per-object metadata, and investigate (c) sampling-based mechanisms for retrieval of access traces and (d) optimization of prefetching and eviction based on extrapolated future behavior.
Emerging cloud and edge applications, such as AI and analytics services, 5G/6G Open RAN, Internet of Things (IoT), autonomous driving, and extended reality (XR), demand highly reliable and secure infrastructures. The key to ensuring reliability and security is to achieve timely and accurate observability of underlying system components and workloads, including CPUs, storage, and networks. With the growth of system complexity, existing metric monitoring solutions, such as Prometheus, suffer from significant scalability and query latency issues. We propose PrometheusSK to enhance data ingestion and metric query processes in Prometheus and scale it to monitor large-scale clusters in real-time. PrometheusSK leverages sketches, which are high-fidelity approximate techniques with probable performance guarantees. We identify performance bottlenecks of existing Prometheus deployments and show initial promises of sketch-enhanced metric monitoring in reducing these bottlenecks. As an application, we envision PrometheusSK to enable the detection of sophisticated cyberattacks targeting cloud deployments, such as auto-scaling and resource-sharing via containers.
Serverless computing is becoming popular due to the convenience of not having to maintain servers/VMs, the auto-scalability provided by the framework, and the low cost. We envision developers building general applications and systems, in addition to compute-centered applications, using the serverless paradigm. However, to realize this vision, current serverless platforms must overcome limitations such as inefficiencies in long-term state access, the lack of communication support among function instances, and the high overhead for the chain of function calls. We aim to explore the serverless framework design necessary for the generalized serverless computing vision centered around a serverless file system (i.e., a file system constructed using the serverless model). In particular, we claim that additional stateful service from the serverless framework and fine-grained resource allocation per function are necessary and explore these new features to reinforce the framework. We first design the file system on a hybrid serverless and serverful model (i.e., VM) and perform a cost and performance analysis. Then we carefully analyze components of the file system more suited for serverful execution and explore a new serverless framework design that can turn the file system more serverless friendly with an appropriate resource allocation. We further generalize the new features and evaluate their utility in different applications from cost and performance perspectives.
From online purchase recommendations and news feeds, to continuous monitoring of large data centers, IoT, and controlling autonomous cars, data stream processing today lies in the heart of modern business analytics and prediction models. Inevitably, a large body of recent cloud computing research has focused on optimizing the performance of data stream applications. Yet, little effort has been devoted to understanding and improving their energy efficiency. In this talk, we argue that an OS-centric approach is essential to design optimizations across the stack. We will share recent research results that demonstrate how exploiting low-level Linux and hardware knowledge can dramatically improve performance while lowering energy consumption. Finally, we will share a vision for wider collaboration between researchers across all system layers in the pursuit of open-source stream processing optimization and innovation.
3 pm – 3:15 pm: Break & Networking
3:15 pm – 5 pm: Dual Track
MOC Alliance Deep Dives
If you could build the next Internet, what would it look like? FABRIC (https://fabric-testbed.net/) is developing an advanced national network infrastructure that will help network, security and systems researchers do just that, and along the way helps make scientific discoveries faster and easier by improving the underlying complex cyberinfrastructure and algorithms. In this tutorial potential FABRIC early experimenters will enroll onto FABRIC, learn how to use the FABRIC portal and manage their credentials through the portal and create basic experiments using FABRIC Jupyter Hub. The participants will learn to use FABRIC experimenter-facing features including FABlib Python API library and FABRIC experiment measurement capabilities.
This tutorial does not require prior familiarity with FABRIC, although a basic understanding of Linux command line, the use of SSH, Jupyter Notebooks and Python are recommended. Attendees are required to bring their own laptops. They are strongly encouraged to complete FABRIC enrollment before the start of the tutorial.
IUCRC Research Proposals
A new model for data centers and cloud computing is disaggregation where accelerators such as GPUs and FPGAs are directly connected to the network. While this configuration provides advantages in terms of more efficient data movement (accelerators can process data directly from the network rather than incurring additional latency introduced by requiring the data to first go through a CPU), it also introduces new security vulnerabilities. In particular, any network attached device can launch a denial of service attack by flooding the network with unwanted packets. In this research we plan to investigate Heavy Hitter Detection (HHD) and sketch-based polling in-band network telemetry to detect such a DOS attack. We will investigate implementing these algorithms on an FPGA to determine its ability to police the network and multi-tenant users on the same FPGA.
Cloud architectures have enabled unprecedented scale and elasticity for modern applications. However, security and privacy concerns persist in cloud environments, and in fact are exacerbated by the immense amounts of sensitive data that cloud applications can access and unintentionally expose. Static data flow analyses have been used to great effect in other contexts — for instance, mobile applications — to detect unwanted, unintended, or malicious confidentiality and privacy violations. However, cloud applications present distinct, novel, and open challenges to existing analysis approaches.
We propose Interlock, a cloud-first static data flow analysis framework tailored for enforcing desired user confidentiality and privacy properties on cloud applications. Using a small set of seed annotations provided by developers, Interlock automatically summarizes data flow behavior for application components such as functions, state machines, storage buckets, and key value stores. These component summaries are composable, enabling scalable analysis across different instantiations of cloud applications. Data flow summaries can be checked against security policies specified by users as well as cloud operators, enabling continuous and proactive detection and remediation of unprotected data sources and unintended leakage channels. Interlock can also suggest remediation actions to tame detected security and privacy violations, for instance via IAM policy updates to mitigate unintended data flows. Interlock directly addresses the inherently distributed nature of cloud workloads, and adopts a cloud-tailored approach to security and privacy for modern applications.
The diversity of software and systems components and the fast-paced development cycles in the cloud exacerbate the challenge in detecting, locating, or solving problems related to performance, resilience, and security. Conventional methods that rely on human experts or that focus on a single specific problem are insufficient as they are prone to errors, costly, and not scalable. The proposed project aims to address the lack of effective cloud operations by incorporating intelligent analytics methods into real-world production systems. Specific objectives are to provide a range of highly automated operations functions that are integrated with user-friendly, easily accessible analytics to enhance performance, resilience, and security. To achieve these goals, the project pursues the following interacting lines of research: (1) Methods for providing feedback during coding stage to rapidly expose vulnerable, inefficient, or otherwise problematic code to developers; (2) software discovery during the deployment stage to identify known vulnerabilities, bugs, and other unwanted software as well as changes that correlate with underlying system-level behavior (such as delays or inefficiencies); and (3) runtime analytics to diagnose and mitigate the most challenging performance, security, and resilience problems in the cloud.
Fully Homomorphic Encryption (FHE)-based computing has emerged as a leading technology for enabling privacy-preserving computing (PPC) in cloud systems. FHE provides the strongest guarantees for data privacy as it operates on encrypted data. Unfortunately, processing encrypted data using FHE takes multiple orders of magnitude longer than processing unencrypted data, as FHE-based computing suffers from prohibitively high compute and memory requirements. In this project, over multiple years, we propose to explore the design of custom accelerators for accelerating FHE-based computing. To address FHE’s high compute requirements, we investigate a variety of techniques including custom hardware support for long words and modular arithmetic, scheduling, and dataflow management. To address memory bottlenecks in FHE-based computing, we pursue several algorithmic optimizations that reduce memory demands, and we explore novel memory architectures. We consider BGV, B/FV, and CKKS schemes, thus supporting operations on both integers and floating point numbers, and in turn supporting a variety of applications.
FPGAs are crucial in many data-movement-heavy domains and therefore are often central components in switches/routers, IoT devices, SmartNICs, and other system components. Effectively leveraging FPGA flexibility requires innovation at all levels of the stack. Both the application stack and the system stack need to support application specific tuning in order to give developers the control needed to make design tradeoffs. These tradeoffs in turn enable developers to meet their specific metric targets for success, such as performance, energy consumption, and resource usage. As a result, comprehensive, coherent, cohesive, and compatible research is needed to fully realize the capability of FPGAs. Moreover, this research should advance the usability of FPGAs as well in order to ensure practical development turnaround times.
The Programmable and Real-time Hardware Innovation (PHI) Lab at Boston University, led by PIs Herbordt, Mancuso, Liu, Athanassoulis, and West, is focused on this exact type of research. Our projects cover a vast array of topics in the data-intensive application, systems, and tooling research areas. And, they do so in a collaborative fashion in order to support future integration. In this presentation, we will touch on these research areas and discuss how we are using FPGAs to implement and accelerate important workloads, and the design techniques we are targeting in order to reduce the overhead of hardware development.
For data-intensive applications research, our projects build prototypes that demonstrate the performance and power-performance potential of having custom compute pipelines that are tailored to be application specific to the greatest extent possible. Additionally, these prototypes demonstrate novel functionality enabled by having programmable logic in the middle of traditional data flows (e.g., having programmable logic between CPU and DRAM that allows for near-data processing and less overall data movement). Examples of these projects include: Graph Neural Networks, Neural Network training, Molecular Modelling, Relational Memory, Recommender Systems, Data Systems, Multi-Party Computation, Telemetry, Compression and safety critical and automotive.
For systems research, our projects are aimed at building highly configurable controllers that streamline data movement in and out of the chip. This reduces the complexity of applications, reduces data movement overheads, and provides uniform abstraction that enable application portability across different FPGAs and environments. Examples of these projects include: SmartNICs, MPI offload, SDN, Programmable Data Planes, Memory Controllers, and PCIe Controllers.
For tooling research, our projects create tooling that minimizes developer effort required to express designs, while still providing the configuration options needed to customize hardware blocks. We couple these design techniques with Reinforcement Learning to further reduce developer expertise requirements by replacing them with automated tuning. Examples of these projects include High Level Synthesis and DISL (a hardware OS generator).
5 pm – 5:05 pm: Closing Remarks – Peter Desnoyers, Northeastern University
5:05 pm – 5:45 pm: Industry Feedback Session
Closed session for IUCRC industry attendees only
5:05 pm – 6:30 pm: Poster Session
6:30 pm – 8 pm: Reception (17th Floor, 665 Comm Ave)
Wednesday, March 22
8 am – 8:30 am: Breakfast and Networking
8:30 am – 10 am: Industry Feedback Session
10 am – 11 am: NSF Session with Industry
Industry and NSF Representatives Only
11 am – 11:30 am: IUCRC Closing Remarks
Birds of a Feather
Monday, March 20th
12:30 pm – 2 pm, Small Ballroom: Federated Authentication
Initially Poster Session 3/20 12:30-2- Technology Choices made for the NERC
ESI is a hardware isolation project built on top of OpenStack that allows multiple bare metal node owners to collaborate to form a single bare metal cloud. Owners have exclusive use of their nodes, and can also lease their nodes out to lessees. ESI is running in a production environment in MOC and the team is planning to install ESI in other research clouds as well. Join us to know the project roadmap of ESI and what the community can expect from ESI in the coming months.
5:30 pm – 6:30 pm, Room 310: Can the MOC be the new version of the Library of Alexandria? How to make it fireproof? led by Larry Rudolph and Salil Vadhan
What are the opportunities for hosting, sharing, and analyzing datasets in the MOC? What privacy and security tools are needed to enable this?
Tuesday, March 21st
Over the last several years, Red Hat has begun to reevaluate the way it interacts with and contributes to education. Our many-faceted interactions with teaching and learning: Open Curriculum, Academy, Workforce Dev, Professional Training, Internships, Apprenticeships; along with our involvements in research are growing and evolving and we could use your input to help learn the ways in which we can best interact, serve, and contribute. Please join us for an open and candid discussion on the future of Red Hat in Education and Research.
Time & Location To Be Determined
Most up-to-date details can be found here.
Monitoring of OpenStack and Openshift Clusters
OpenStack and OpenShift clusters require proactive monitoring when operated as production resources. There are built in mechanisms as well as additional tools to establish monitoring capabilities. During this session we would like to discuss what is currently available and what is lacking, as well as some of our experiences adopting existing tools. We also would like to discuss various possible paths to bringing additional capabilities into our currently deployed monitoring tools.
We will discuss FPGA usage in OCT including current user experiences and future plans.
The MGHPCC hosts multiple systems research testbeds like the Open Cloud Test Bed, Cloudlab and Chameleon as well as networking research platforms/services like Fabric and Internet 2. This BOF is focused on connecting all of them together.