MIT Cloud Workshop - 2018

2018 MIT IAP Cloud Workshop  - Agenda, Abstracts and Bios

Please check back later for updates

Platinum Sponsors: Cavium, Samsung, and Western Digital

Room and Address: Kiva Room in Building 32 (Room 32G-449), MIT, Cambridge, MA

Time: 8:00AM–6PM (Badge Pick-up at 8:00AM – Advance Registration is Required


Badge Pick-up – Coffee/Tea and Breakfast Food/Snacks

8:25-8:30AM Welcome – Prof. Daniel Sanchez, MIT
8:30-9:00AM   Prof. Orran Krieger, Boston University, “Research in an Open Cloud Exchange”
9:00-9:20AM Elliot Delaye, Xilinx, “Integrating AI into Accelerated Cloud Applications”
9:20-9:40AM  Prof. Song Han, MIT, “Bandwidth-Efficient Deep Learning: Compression, Acceleration and Meta-learning”
9:40-10:00AM  Dr. Cyril Guyot, Western Digital Research, “Low-latency software architecture for distributed shared memory”
10:00-10:30AM Prof. Mothy Roscoe, ETH Zurich, “Enzian: A Research Computer”
10:30-11:00AM Lightning Round of Student Posters
11:00-12:30PM Lunch and Cloud Poster Viewing

Prof. Daniel Sanchez, MIT, "Making Parallelism Pervasive with Swarm Architecture"

1:00-1:20PM Dr. Jason Zebchuk, Cavium, “ThunderX2 in Cloud and HPC”
1:20-1:40PM   Prof. Christina Delimitrou, Cornell, "Using Big Data to Navigate The Increasing Complexity of The Cloud"
1:40-2:00PM Dr. George Apostol, Samsung, "Data Center Demands for Autonomous Vehicle Development"
2:00-2:30PM Prof. Joel Emer, Nvidia/MIT, "Dissecting Spectre and Meltdown"
2:30-3:00PM Break  - Refreshments and Poster Viewing
3:00-3:30PM Prof. Nate Foster, Cornell, “Verifying Network Data Planes”
3:30-3:50PM Fredrik Kjolstad, MIT, “The Tensor Algebra Compiler”
3:50-4:10PM Prof. Tim Kraska, MIT,  “The End of Slow Networks: It's Time for a Redesign”
4:10-4:30PM Dr. Matt Perron, MIT, “Choosing and Configuring Your Analytics Database in the Cloud”
4:30-5:30PM Reception - Refreshments and Poster Awards

Dr. George Apostol, Samsung, “Data Center Demands for Autonomous Vehicle Development”
Abstract: The design of autonomous vehicles is a driving force in the industry today and is placing greater demands on the data center infrastructure needed to support the development efforts. Significant pressure is being put on all areas of data center architecture, stressing the overall ecosystem. From data generation, storage, processing, orchestration, and more, requirements are being increased by orders of magnitude at a very fast pace. This talk will quantify the challenges ahead in the development process, addressing the pain points of the current architectures, with a view to enhancements needed across multiple technologies to support these very complex workloads. Wholesale improvements are needed across data center technical disciplines providing the opportunity for innovation among industry and academia towards a more robust data center environment.

Bio: George Apostol is Vice President, Data Center Solutions, in the Samsung Strategy and Innovation Center, a division of Samsung Electronics. Since May 2013, his role is to lead the development of new systems architectures and apply Samsung's world-class, market-leading, component technology to the implementation of revolutionary next-generation data centers. Prior to Samsung, he was Executive Vice President, Engineering and Operations and Chief Technology Officer for Exar Corporation; Vice President of Engineering and Chief Technology Officer for PLX Technology Inc; and executive management at several start-up companies. Mr. Apostol has over 30 years of experience in the design and delivery of a variety of complex electronic system products and has held senior engineering development and management positions at Cabletron Systems, TiVo, LSI Logic, Silicon Graphics, Sun Microsystems, and Xerox. With a strong background in systems design and ASIC design methodologies, he holds 12 US patents in the areas of bus interface, buffer management, and systems architecture; he has shipped multiple system products and systems-on-silicon; and has developed several ASIC design productivity tools. Mr. Apostol performed his academic research in magnetic resonance and knowledge-based imaging at the Harvard Medical School and the MIT Sloan School of Management and is a graduate of the Massachusetts Institute of Technology.

Elliot Delaye, Xilinx,“Integrating AI into Accelerated Cloud Applications”
Abstract: This talk will provide an overview of acceleration on the recently released Xilinx ML suite, which is deployed on the Amazon EC2 cloud.  We will review the architecture and demonstrate a cloud-based software stack to evaluate, develop, and deploy machine learning capabilities into your accelerated applications on the cloud.

Bio: Elliott is a Principal Engineer in the Machine Learning Platforms group at Xilinx with over 18 years of experience optimizing complex hardware and software systems. He has over 25 patents and patents pending in the areas of system design, deep learning and computer architecture. He received his M.S in Electrical and Computer Engineering from Carnegie Mellon University where he developed software for autonomous mobile robot mapping and localization. He also has a B.S. in Electrical and Computer Engineering and Computer Science from Carnegie Mellon University.

Prof. Christina Delimitrou, Cornell University, “Using Big Data to Navigate The Increasing Complexity of The Cloud”
Abstract: Cloud applications have recently undergone a major redesign, switching from monolithic implementations that encompass the entire functionality of the application in a single binary, to large numbers of loosely-coupled microservices. This shift comes with several advantages, namely increased speed of deployment and ease of debugging, however, it also brings challenges. First, microservices shift the computation to communication ratio putting more pressure on low-latency, high-bandwidth networks. Second, the dependencies between microservices make scheduling and resource management challenging, as any poorly-managed microservice can introduce end-to-end QoS violations. Manually discovering the impact of these dependencies becomes increasingly impractical as applications and systems scale. Apart from the cloud, microservices are also popular in IoT settings, which adds extra complexity due to the limited reliability and battery lifetime of edge devices. In this talk I will briefly discuss the assumptions about cloud design and management that microservices change, and then describe Seer, a new cloud manager that leverages practical learning techniques to uncover performance, efficiency, and security issues online, and manage end-to-end QoS in large-scale cloud and IoT microservices.

Bio: Christina Delimitrou is an Assistant Professor and the John and Norma Balen Sesquicentennial Faculty Fellow at Cornell University, where she works on computer architecture and computer systems. Specifically, Christina focuses on improving the resource efficiency of large-scale cloud infrastructures by revisiting the way these systems are designed and managed. She is the recipient of a Facebook Fellowship, 3 IEEE Micro Top Picks awards, and a Stanford Graduate Fellowship. Before joining Cornell, Christina received her PhD from Stanford University. She previously had received an MS from Stanford, and a diploma in Electrical and Computer Engineering from the National Technical University of Athens. More information can be found at:

Prof. Joel Emer, Nvidia/MIT, "Dissecting Spectre and Meltdown”
Abstract: The relatively recent disclosures of the Spectre and Meltdown vulnerabilities have resulted in a flurry of patches and workarounds focused on those vulnerabilities and their variants. However, more fundamentally they have precipitated a major shift in our understanding of the relationship between architecture, implementation and security. In this talk, I will attempt to take a step back and look at a generalization of the schema employed by those attacks to illustrate the principal steps involved and the myriad of combinations of scenarios in which they might be manifest. Understanding those steps might also help focus on the new relationship between architecture and security and the means by which to mitigate such attacks. So, finally, I will describe some of our work on one such mitigation that disrupts the most common communication channel used by the attacks.

Bio: For nearly 40 years, Joel Emer has held various research and advanced development positions investigating processor microarchitecture and developing performance modeling and evaluation techniques. He has made architectural contributions to a number of VAX, Alpha and X86 processors and is recognized as one of the developers of the widely employed quantitative approach to processor performance evaluation. More recently, he has been recognized for his contributions in the advancement of simultaneous multithreading, processor reliability analysis, cache organization and spatial architectures. Currently he is a Senior Distinguished Research Scientist in Nvidia's Architecture Research group. In his spare time, he is a Professor of the Practice at MIT. Prior to joining Nvidia he worked at Intel where he was an Intel Fellow and Director of Microarchitecture Research. Even earlier, he worked at Compaq and Digital Equipment Corporation. He earned a doctorate in electrical engineering from the University of Illinois in 1979. He received a bachelor's degree with highest honors in electrical engineering in 1974, and his master's degree in 1975 -- both from Purdue University. Among his honors, he is a Fellow of both the ACM and IEEE, and he was the 2009 recipient of the Eckert-Mauchly award for lifetime contributions in computer architecture.

Prof. Nate Foster, Cornell University, “Verifying Network Data Planes”

Abstract: P4 is a new language for programming network data planes. The language provides domain-specific constructs for describing the input-output formats and functionality of packet-processing pipelines. Unfortunately P4 programs can go wrong in a variety of interesting and frustrating ways including reading uninitialized data, generating malformed packets, and failing to handle exceptions. In this talk, I will present the design and implementation of p4v, a tool for verifying P4 programs. The tool is based on classic software verification techniques due to Hoare and Dijkstra, but adds several important innovations: a novel mechanism for incorporating control-plane assumptions and domain-specific optimizations, both of which are needed to scale up to large programs. I will discuss our experiences applying p4v to a variety of real-world programs including switch.p4, a large program that implements the functionality of a conventional switch.

p4v is joint work with colleagues from Barefoot, Lugano, and Yale.

Biography: Nate Foster is an Associate Professor of Computer Science at Cornell University and a Principal Research Engineer at Barefoot Networks. The goal of his research is to develop languages and tools that make it easy for programmers to build secure and reliable systems. He currently serves as chair of the P4 Language Consortium steering committee and as a member of the ACM SIGCOMM Symposium on SDN Research (SOSR) steering committee. He received a PhD in Computer Science from the University of Pennsylvania, an MPhil in History and Philosophy of Science from Cambridge University, and a BA in Computer Science from Williams College. His awards include an NSF CAREER award and a Sloan Fellowship.

Cyril Guyot, Western Digital Research, “Low-latency software architecture for distributed shared memory”
Abstract: Low access latency of emerging NVM devices require rethinking the design of the management software traditionally used in distributed environments. In combination with high-performance networks, expected latency for accessing remote storage drops below 2us, migrating the critical path overhead towards the software management layer. To reduce  the software impact on performance and increase the overall system scalability, we depart from the traditional client-server architecture, and instead explore the design space based on one-sided (rdma) operations. This presentation describes our distributed storage management layer and the performance tradeoffs incurred by our design decisions with respect to consistency model and data placement.

Bio: Cyril's interests span distributed systems, machine learning, information theory and cryptography. In Western Digital's Research organization, he has been leading the Software Solutions and Algorithms team in developing novel algorithms and implementations for storage systems and persistent memory architectures.

Prof. Song Han, MIT, ”Bandwidth-Efficient Deep Learning: Compression, Acceleration and Meta-learning”
Abstract: As CloudAI are successfully attracting an increasing number of customers, the amount of hardware resource (GPU/TPU) will eventually bottleneck the growth. Applying brute-force programming to data demands a huge amount of machine power to perform training and inference, and a huge amount of manpower to design the neural network models, which is inefficient. In this presentation, I will talk about techniques that make both inference and training more efficient. we provide techniques to solve three bottlenecks: saving memory bandwidth for inference by model compression, saving networking bandwidth for training by gradient compression, and saving engineer bandwidth for model design by using AI to automate the design of models.

Bio: Song Han is starting in July 2018 as an assistant professor in the Electrical Engineering and Computer Science Department of the Massachusetts Institute of Technology (MIT). Dr. Han received the Ph.D. degree in Electrical Engineering from Stanford University advised by Prof. Bill Dally.

Dr. Han's research focuses on energy-efficient deep learning, at the intersection between machine learning and computer architecture. He proposed Deep Compressionthat can compress deep neural networks by an order of magnitude without losing the accuracy. He designed EIE: Efficient Inference Engine, the first sparse neural network accelerator, which saves memory bandwidth and results in significant speedup and energy saving. His work has been featured by TheNextPlatformTechEmergence,Embedded Vision and O'Reilly. His research efforts in model compression and hardware acceleration received th Best Paper Award at ICLR'16 and the Best Paper Award at FPGA'17, and also led to a startup, DeePhi Tech, which Song co-founded in 2016. Before joining Stanford, Song graduated from Tsinghua University.

Fredrik Kjolstad, MIT, ”The Tensor Algebra Compiler”
Abstract: Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering, and science.  Increasingly often the tensors are sparse, which means most components are zeros.  Programmers are left to write kernels for every operation, with different mixes of sparse and dense tensors in different formats.  There are countless combinations, which makes it impossible to manually implement and optimize them all. The Tensor Algebra Compiler (TACO) is the first system to automatically generate kernels for any tensor algebra operation on sparse and dense tensors.  Its performance is competitive with best-in-class hand-optimized kernels in popular libraries, while supporting far more tensor operations.  In this talk I will describe what makes taco work and how it might be used in cloud systems. For more information, see

Bio: Fredrik Kjolstad is a PhD candidate at MIT advised by Saman Amarasinghe. He works on compiler techniques and programming language constructs for sparse computing, including the TACO compiler and the Simit programming language. He believes we should program with systems as well as objects, and that code should shape to data.

Prof. Tim Kraska, MIT, “The End of Slow Networks: It's Time for a Redesign”
Abstract: The next generation of high-performance RDMA-capable networks requires a fundamental rethinking of the design of modern distributed systems.

For example, current distributed databases are commonly designed under the assumption that the network is the dominant bottleneck. Consequently, these systems aim to avoid communication between machines at almost all cost, using techniques such as locality-aware partitioning schemes, semi-reductions for joins, and complicated preprocessing and load balancing steps. Even worse, this rule of thumb created mantras like “distributed transactions do not scale,” which to this day affect the way applications are designed.

The next generation of networks presents an inflexion point on how we should design distributed systems. With nascent modern network technologies, the assumption that the network is the bottleneck no longer holds. Even today, with InfiniBand FDR 4x, the bandwidth available to transfer data across the network is in the same ballpark as the bandwidth of one memory channel.

In this talk, I will first argue by using DBMSs as an example why existing distributed systems cannot take full advantage of high-performance networks and afterwards propose a new architecture for building data-centric systems for the next generation of networks. Finally, I will discuss initial results from a prototype implementation of our proposed architecture for OLTP and OLAP, showing remarkable performance improvements over existing designs.

Bio: Tim Kraska is an Associate Professor of Electrical Engineering and Computer Science in MIT's Computer Science and Artificial Intelligence Laboratory. Currently, his research focuses on building systems for machine learning, and using machine learning for systems. Before joining MIT, Tim was an Assistant Professor at Brown, spent time at Google Research, and was a PostDoc in the AMPLab at UC Berkeley after he got his PhD from ETH Zurich. Tim is a 2017 Alfred P. Sloan Research Fellow in computer science, received the 2017 VMware Systems Research Award, an NSF CAREER Award, an Air Force Young Investigator award, two Very Large Data Bases (VLDB) conference best-demo awards, and a best-paper award from the IEEE International Conference on Data Engineering (ICDE).


Prof. Orran Krieger, Boston University, “Research in an Open Cloud Exchange”
Abstract: While cloud computing is transforming society, today's public clouds are black boxes, implemented and operated by a single provider that makes all business and technology decisions. Can we architect a cloud that enables a broad industry and research community to participate in the business and technology innovation? Do we really need to blindly trust the provider for the security of our data and computation? Can we expose rich operational data to enable research and to help guild users of the cloud?

The Massachusetts Open Cloud (MOC) is a new public cloud project based on alternative marketplace-driven model of a public cloud - that of an Open Cloud eXchange (OCX) - where many stakeholders (including the research community) participate in implementing and operating an open cloud. Our vision is to create an ecosystem that brings the innovation of a broader community to bear on a healthier and more efficient cloud marketplace, where anyone can standup up a new hardware or software service, and users can make informed decisions between them. The OCX model effectively turns the cloud into a production-scale laboratory for cloud research and innovation.

The MOC is a collaboration between the Commonwealth of Massachusetts, universities (Boston University, Northeastern, MIT, Harvard and UMass), and industry (in particular Brocade, Cisco, Intel, Lenovo, Red Hat, and TwoSigma). In this talk I will give an overview of the vision of this project, its enabling technologies and operational status, and some of the different research projects taking place.

Bio: Orran Krieger is the lead on the Massachusetts Open Cloud, Founding Director for the Cloud Computing Initiative (CCI) at BU, Resident Fellow of the Hariri Institute for Computing and Computational Science & Engineering, and a Professor of the practice at the Department of Electrical and Computer Engineering Boston University. Before coming to BU, he spent five years at VMware starting and working on vCloud. Prior to that he was a researcher and manager at IBM T. J. Watson, leading the Advanced Operating System Research Department. Orran did his PhD and MASc in Electrical Engineering at the University of Toronto.

Matt Perron, MIT, "Choosing and Configuring Your Analytics Database in the Cloud"
Abstract: The cloud offers a dizzying array of database services, compute instance types, and storage options. However, this range of options make choosing the best service and configuration for your workload particularly challenging. In this work-in-progress, we compare the performance and cost of some of the most popular options for analytics including Hive, Presto, Vertica, Redshift, and SparkSQL on AWS. We present a model that assists in choosing the best configuration for each system, and discuss the challenges of predicting the performance of different configurations and different systems.

Bio: Matthew Perron is a first year PhD student in the Database Group at MIT working with Michael Stonebraker and Samuel Madden. His research interests include improvements to data warehousing for the cloud and query optimization. Perron is a Irwin Mark Jacobs and Joan Klein Jacobs Presidential Fellow and received Honorable Mention in the National Science Foundation Graduate Research Fellowship Program in 2017. Perron received his M.S. in Computer Science from Carnegie Mellon University in 2016 and his B.S. in Computer Science from Rochester Institute of Technology in 2013.

Prof. Mothy Roscoe, ETH Zurich, “Enzian: A Research Computer”

Abstract: Academic research in rack-scale and datacenter computing today is hamstrung by lack of hardware.  Cloud providers and hardware vendors build custom accelerators, interconnects, and networks for commercially important workloads, but university researchers are stuck with commodity, off-the-shelf parts.

Enzian is a research computer developed at ETH Zurich in collaboration with Cavium and Xilinx which addresses this problem.  An Enzian board consists of a server-class ARMv8 SoC tightly coupled and coherent with a large FPGA (eliminating PCIe), with about 0.5 TB DDR4 and about 600 Gb/s of network I/O either to the CPU (over Ethernet) or directly to the FPGA (potentially over custom protocols).  Enzian runs both Barrelfish and Linux operating systems.

Many Enziam boards can be connected in a rack-scale machine (either with or without a discrete switch), and the design is intended to allow many different research use-cases: zero-overhead run-time verification of software invariants, novel interconnect protocols for remote memory access, hardware enforcement of access control in a large machine, high-performance streaming analytics using a combination of software and configurable hardware, and much more. By providing a powerful and flexible platform for computer systems research, Enzian aims to enable more relevant and far-reaching work on future compute platforms.

Bio: Timothy Roscoe is a Full Professor in the Systems Group of the Computer Science Department at ETH Zurich, where he works on operating systems, networks, and distributed systems, including the Barrelfish research OS and the Strymon high-performance stream processor for datacenter monitoring.  He received a PhD in 1995 from the Computer Laboratory of the University of Cambridge, where he was a principal designer and builder of the Nemesis OS.  After three years working on web-based collaboration systems at a startup in North Carolina, Mothy joined Sprint's Advanced Technology Lab in Burlingame, California in 1998, working on cloud computing and network monitoring.  He joined Intel Research at Berkeley in April 2002 as a principal architect of PlanetLab, an open, shared platform for developing and deploying planetary-scale services.  In September 2006 he spent four months as a visiting researcher in the Embedded and Real-Time Operating Systems group at National ICT Australia in Sydney, before joining ETH Zurich in January 2007.  His current research interests include monitoring, modelling, and managing complex enterprise datacenters, and system software for modern hardware.  He was named Fellow of the ACM in 2013 for contributions to operating systems and networking research.

Prof. Daniel Sanchez, MIT, “Making Parallelism Pervasive with the Swarm Architecture”
Abstract: With Moore's Law coming to an end, architects must find ways to sustain performance growth without technology scaling. The most promising path is to build highly parallel systems that harness thousands of simple and efficient cores. But this approach will require new techniques to make massive parallelism practical, as current multicores fall short of this goal: they squander most of the parallelism available in applications and are too hard to program.

I will present Swarm, a new architecture that successfully parallelizes algorithms that are often considered sequential and is much easier to program than conventional multicores. Swarm programs consist of tiny tasks, as small as tens of instructions each. Parallelism is implicit: all tasks follow a programmer-specified total or partial order, eliminating the correctness pitfalls of explicit synchronization (e.g., deadlock, data races, etc.). To scale, Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover enough parallelism.

Swarm builds on decades of work on speculative architectures and contributes new techniques to scale to large core counts, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered task commits. Swarm also incorporates new techniques to exploit locality and to harness nested parallelism, making parallel algorithms easy to compose and uncovering abundant parallelism in large applications.

Swarm accelerates challenging irregular applications from a broad set of domains, including graph analytics, machine learning, simulation, and databases. At 256 cores, Swarm is 53-561x faster than a single-core system, and outperforms state-of-the-art software-only parallel algorithms by one to two orders of magnitude. Besides achieving near-linear scalability, the resulting Swarm programs are almost as simple as their sequential counterparts, as they do not use explicit synchronization.

Bio: Daniel Sanchez is an Associate Professor of Electrical Engineering and Computer Science at MIT. His research interests include parallel computer systems, scalable and efficient memory hierarchies, architectural support for parallelization, and architectures with quality-of-service guarantees. He earned a Ph.D. in Electrical Engineering from Stanford University in 2012 and received the NSF CAREER award in 2015.

Dr. Jason Zebchuk, Cavium, “ThunderX2 for Cloud and HPC Applications”

Abstract:  We present ThunderX2, Cavium’s 2nd generation ARM Server, focusing on performance characteristics with HPC and Cloud applications.

Bio: Dr. Jason Zebchuk joined the Cavium hardware architecture team for server and embedded computing in 2013. He has worked on Octeon Fusion-M baseband processors and Octeon TX embedded processors. Currently he leads the development of SoC simulators for Octeon TX and Thunder processor chips. He received his Ph. D. in Computer Engineering from the University of Toronto in Canada.