2017 U of Washington IAP Cloud Workshop - Agenda, Abstracts and Bios

Husky Union Building (HUB), Room 334 - Friday, March 31, 2017

Platinum Sponsors: Cavium, Cisco, Huawei, Samsung, SanDisk

8:00-8:30AM

Badge Pick-up – Coffee/Tea and Breakfast Food/Snacks

8:25-8:30AM   Welcome – Prof. Arvind Krishnamurthy and Prof. Luis Ceze, UW
8:30-9:00AM   Doug Burger, Microsoft ,"Running Hardware Microservices on
Configurable Clouds"
9:00-9:30AM Tom Anderson, UW, "High Performance RPC Packet Processing with
FlexNIC"
9:30-10:00AM Derek Chickles, Cavium, "Intelligent NICS for the Data Center"
10:00-10:30AM  Gabriel Southern, Intel, "FPGAs in the Data Center"
10:30-11:00AM  Lightning Round of Student Posters
11:00-12:30PM Lunch and Cloud Poster Viewing
12:30-1:00PM Prof. Arvind Krishnamurthy, UW, "What Can We Do with Programmable Switches?"
1:00-1:30PM   Prof. Franziska Roesner, UW, "Security and Privacy for Augmented Reality Platforms"
1:30-2:00PM   Prof. Luis Ceze, UW, "A DNA-based Archival Storage System"
2:00-2:30PM  

Dr. Pankaj Mehra, Sandisk, "Evolutionary Changes with Revolutionary Implications: Persistent Memory in the Data Center"

2:30-3:00PM Break - Refreshments and Poster Viewing
3:00-3:30PM Prof. Michael Taylor, UW, "Specializing the Planet's Computation: ASIC Clouds"
3:30-4:00PM Kees Vissers, Xilinx, "A Framework for Reduced Precision Neural Networks on FPGAs"
4:00-4:30PM Dan Ports, UW, "Rethinking Distributed Systems for the Datacenter"
4:30-5:00PM Kim Hazelwood, Facebook, "Scalable Performance in Facebook Data Centers"
5:00-6:00PM Reception - Refreshments and Poster Awards


Abstracts and Bios

Arvind Krishnamurthy, UW, "What Can We Do with Programmable Switches?"
Abstract: Emerging networking architectures are allowing for flexible and reconfigurable packet processing at line rate. These emerging technologies address a key limitation with software defined networking solutions such as OpenFlow, which allow for custom handling of flows only as part of the switch's control plane. Many network protocols, such as those that perform resource allocation, require per-packet processing, which is feasible only if the switch's data plane can be customized to the needs of the protocol. These new technologies thus have the potential to address this limitation and truly enable a "Software Defined Data Plane" that provides greater performance and isolation for datacenter applications.


Despite their promising new functionality, flexible switches are not all-powerful; they have limited state, support limited types of operations, and limit per-packet computation in order to be able to operate at line rate. Our work addresses these limitations by providing a set of general building blocks that mask these limitations using approximation techniques and thereby enabling the implementation of realistic network protocols. Our work thus represents a first step towards developing an understanding as to what is required of a "switch instruction set" in order to realize networking protocols that require in-network processing.

Bio: Arvind Krishnamurthy is a Professor of Computer Science and Engineering at the University of Washington. His research interests span all aspects of building practical and robust computer systems. His recent work is aimed at making improvements to the robustness, security, and performance of Internet-scale systems. Recent projects include Arrakis, Sapphire, approximately synchronous systems, transactional storage, and uProxy.


Dan Ports, UW, "Rethinking Distributed Systems for the Datacenter"
Abstract: Dan Ports joined the University of Washington faculty as a Research Assistant Professor in 2015, after receiving his Ph.D. from MIT in June 2012 and completing a postdoc at UW Computer Science & Engineering. Previously, he received M.Eng. (2007) and S.B. degrees in Computer Science and Mathematics (2005) from MIT.


Dan leads distributed systems research in UW CSE’s Systems Lab. His ongoing research focuses primarily on building distributed systems for modern data-center-scale applications. He uses a variety of techniques to build practical systems that are faster, more reliable, easier to program, and more secure. To do so, Dan takes a broad view of the systems field, having worked in areas ranging from operating systems and distributed systems to networking, databases, architecture and security, and often finds interesting opportunities that lie at the intersections of these subareas.

Dan’s work has been recognized by the distributed systems and operating systems communities. He earned a Best Paper Award at NSDI 2015 for demonstrating the benefits of co-designing a distributed system with its underlying network using a new network-level primitive called Mostly-Ordered Multicast and the Speculative Paxos replication protocol, and the Jay Lepreau Best Paper Award at OSDI 2014 for Arrakis, a new operating system that removes barriers between increasingly sophisticated applications and the hardware on which they run.

In addition to his academic research, Dan was previously affiliated with VMware, where he developed the Overshadow research system for defending applications from compromised operating systems, and cache-aware scheduling algorithms for multiprocessors.


Derek Chickles, Cavium, "Intelligent NICS for the Data Center"

Abstract: Explore the advantages of offloading network control, switching, and security workloads to an intelligent adapter in a data center. By moving intelligence to the adapter, compute node resources are freed up to run the primary applications they were intended to run. Cavium's combination of specialized hardware and software integration in widely adopted ecosystems enables an easy transition with outstanding performance results.

Bio: Derek is a senior software manager in Cavium's LiquidIO Intelligent adapter team at Cavium. His primary focus is on building high performance, scalable drivers and firmware for Cavium's line of intelligent adapters. Before joining Cavium in March 2010, he worked at Catapult Communications (now IXIA) where he led the platform engineering group. Prior to that he worked at BNR/Nortel working on cellular network switching software. Derek earned a BS in Computer Science from the University of Colorado, Boulder.


Doug Burger, Microsoft ,"Running Hardware Microservices on Configurable Clouds"
Abstract: The cloud is an economic disruptor that is changing the shape of our industry. Concurrent to that disruption is an explosion of interest in non-CPU accelerators: FPGAs, GPUs, and ASICs, for many compute-intensive workloads. A key question is how these accelerators are incorporated into cloud architectures in a way that preserves manageability and clean abstractions at world-wide scale. I will describe our new approach to this problem, called Hardware Microservices, where network-addressable services are deployed with no software in the loop. Initially developed on our worldwide FPGA fabric, this architecture permits a general abstraction to design and deploy many accelerated services in the cloud, independent of the specific type of accelerator. This architecture is one good candidate for the structure of cloud acceleration as we move to the post-CPU era.

Bio: Doug Burger is one of the world's leading active researchers in computer architecture, with a broad set of important contributions to his credit. After receiving his PhD from University of Wisconsin in 1998, he joined UT Austin as a professor, receiving tenure in 2004 and becoming a full professor in 2008. His work on Explicit Data Graph Computing (EDGE) represents the fourth major class of instruction-set architectures (after CISC, RISC, and VLIW). At U. Texas, he co-led the project that conceived and built the TRIPS processor, an ambitious multicore ASIC and working EDGE system, which remains one of the most complex microprocessor prototypes ever built in academia. A number of Doug's research contributions, such as non-uniform cache architectures (NUCA caches), are now shipping in Intel, ARM, and IBM microprocessors. He has been recognized as an IEEE Fellow and ACM Fellow, and in 2006 received the ACM Maurice Wilkes Award for his early contributions to the field. He is the co-inventor of more than fifty U.S. patents, including six with Bill Gates.


Doug joined Microsoft in 2008, believing that Microsoft is the right place to do amazing architecture work with huge impact. Since then, he has had significant influence on both the company's products and its technical strategy. He co-founded and co-leads Project Catapult, which set the goal of designing the right post-CPU acceleration architecture for next-generation hyperscale clouds. This work produced the Configurable Cloud architecture, based on network-attached FPGAs, that is now is central to Microsoft's cloud strategy. This project has enabled teams across Microsoft to drive major advances in artificial intelligence/deep learning, Bing ranking, cloud networking, storage efficiency, security, and large-scale service acceleration.
His current group (Hardware, Devices, and Experiences) is highly interdisciplinary, working in areas as diverse as cloud acceleration, silicon architectures, mobile app ecosystem architecture, new optical devices, and machine learning.


Franzi Roesner, UW, "Security and Privacy for Augmented Reality Platforms"
Abstract: Augmented reality (AR) technologies -- which overlay digitally generated information on a user's perception of the real world -- are at the cusp of commercial viability, appearing in platforms like Microsoft HoloLens, Meta's glasses, and automotive windshields. Though these technologies bring great potential benefits, they also raise unforeseen computer security and privacy risks. This talk will explore the new security and privacy challenges that arise with AR systems and describe the work our group is doing to tackle those challenges.

Bio: Franziska (Franzi) Roesner is an Assistant Professor in Computer Science and Engineering at the University of Washington, where she co-directs the Security and Privacy Research Lab. Her research focuses on understanding and improving computer security and privacy for end users of existing and emerging technologies, including the web, smartphones, and emerging augmented reality (AR) and IoT platforms. She is the recipient of an NSF CAREER Award, a best paper award at the IEEE Symposium on Security & Privacy, and a William Chan Memorial Dissertation Award; her early work laying a research agenda for AR security and privacy was featured on the cover of the Communications of the ACM magazine. She is a member of the DARPA ISAT advisory group and the current program co-chair of USENIX Enigma. She received her PhD from the University of Washington in 2014 and her BS from the University of Texas at Austin in 2008.


Gabriel Southern, Intel, "FPGAs in the Data Center"
Abstract: Hardware accelerators are critical for processing the tremendous amount of data received and stored in modern data centers. FPGAs allow accelerators to be dynamically reconfigured to suit the varying needs of cloud computing workloads, but FPGAs are not yet widely used by cloud computing end users. This talk outlines Intel's vision for how Intel CPUs and FPGAs can work together in the data center. It also describes Intel's engagement with the academic community through the Heterogenous Architecture Research Program to support novel research using the Xeon+FPGA platform.

Bio: Gabriel Southern is an engineer in Intel's Programmable Solutions Group. He joined Intel in 2016 and is working on developing solutions for FPGAs in the data center. Prior to joining Intel, Dr. Southern was a student at UC Santa Cruz where his research focused on performance analysis and simulation methodology. Dr. Southern received his Ph.D. from UC Santa Cruz, his M.S. from George Mason University, and his B.S. from the University of Virginia.

Kees Vissers, Xilinx, "A Framework for Reduced Precision Neural Networks on FPGAs"
Abstract: The ongoing research on Neural Networks includes work to reduce the computation and storage requirements for these networks. One of the promising opportunities is the reduction of the compute and storage down to a full binarization. In this talk, we will show a framework for implementing Neural Networks, including these Reduced Precision and Binarized Neural Networks, leveraging C/C++ implementations with High Level Synthesis. Recent research has shown that the accuracy of reduced precision networks is approaching the accuracy of similar networks with Floating Point, 16-bit or 8-bit operations. The results on accuracy exploiting re-training will be presented. The architecture and detailed implementation of the latest large networks will be presented. It will be shown that Tera-ops of performance can be achieved in modern FPGAs with limited power consumption.

Bio: Kees Vissers graduated from Delft University in the Netherlands. He worked at Philips Research in Eindhoven, the Netherlands, for many years. The work included Digital Video system design, HW -SW co-design, VLIW processor design and dedicated video processors. He was a visiting industrial fellow at Carnegie Mellon University, where he worked on early High Level Synthesis tools. He was a visiting industrial fellow at UC Berkeley where he worked on several models of computation and dataflow computing. He was a director of architecture at Trimedia, and CTO at Chameleon Systems. He is recognized contributor to the Chapter on VLIW processors in the book: Computer Architecture, 5th edition: A quantitative approach by John L. Hennessy and David A. Patterson


He was a Board member of Beecube, which is now part of National Instruments. Today he is heading a team of researchers at Xilinx, including a significant part of the Xilinx European Laboratories. The research topics include next generation programming environments for processors and FPGA fabric, high-performance video systems, wireless applications and new datacenter applications. He has been instrumental in the architecture of Zynq and MPSoC and the novel programming environments, leveraging High-Level Synthesis technology. He is continuously driving new architectures and programming environments. His current research includes work on Neural Networks and reduced precision Neural Networks.


Kim Hazelwood, Facebook, "Scalable Performance in Facebook Data Centers"
Abstract: Facebook data centers handle traffic and store data for roughly one billion people per day. Designing the hardware, infrastructure, and efficiency tracking software needed to scale to this level has required significant innovation from the Facebook Infrastructure team. In this talk, I present some of the notable innovations in both data center infrastructure (power, cooling, and server/storage design) and performance analysis and tracking software that goes into our eight global data centers. I also present our approach at ensuring efficient resource utilization by the massive software code bases in our products and services. Most of our major hardware designs have been released through the Open Compute Project, ensuring that the Facebook philosophy of "making the world more open and connected" also applies to our research and engineering innovations.

Bio: Hazelwood is an engineering manager in Facebook's Infrastructure division, where she leads a performance analysis team that drives the data center server and storage roadmap. Her research interests include computer architecture, performance analysis, and binary translation tools. Prior to Facebook, Kim held positions including a tenured Associate Professor at the University of Virginia, Software Engineer at Google, and Director of Systems Research at Yahoo Labs. She received a PhD in Computer Science from Harvard University in 2004, and is the recipient of an NSF CAREER Award, the Anita Borg Early Career Award, the MIT Technology Review Top 35 Innovators under 35 Award, and the ACM SIGPLAN 10-Year Test of Time Award. She has authored over 50 conference papers and one book.

Luis Ceze, UW, "A DNA-based Archival Storage System"
Abstract:
Demand for data storage is growing exponentially, but the capacity of existing storage media is not keeping up. Using DNA to archive data is an attractive possibility because it is extremely dense, with a raw limit of 1 exabyte/mm^3 (10^9 GB/mm^3), and long-lasting, with observed half-life of over 500 years.


I this talk I will presents an architecture for a DNA-based archival storage system, which leverages common biochemical techniques to provide random access. I will also describe a new encoding scheme that offers controllable redundancy, trading off reliability for density. I will discuss results from wetlab experiments that demonstrate feasibility, random access, and robustness of the proposed encoding method. I will end highlighting trends in biotechnology that indicate the impending practicality of DNA storage, and suggest some other opportunities for building betters computer systems by borrowing parts from biology.

Bio: Luis Ceze, Associate Professor, joined the  Computer Science and Engineering faculty in 2007. He holds the Torode Family Career Development Professor endowed chair. His research focuses on the intersection of computer architecture, programming languages and operating systems (see the SAMPA research group). He recently started exploring using biology to make better computers (see the MISL page). Visit his other other webpage. His CV is available here. He has several papers selected as IEEE Micro Top Picks and CACM research Highlights. He participated in the Blue Gene, Cyclops, and PERCS projects at IBM. He is a recipient of an NSF CAREER Award, a Sloan Research Fellowship and a Microsoft Research Faculty Fellowship, and the 2013 IEEE TCCA Young Computer Architect Award. He is also a member of the DARPA ISAT advisory group. He co-founded Corensic, a UW CSE spin-off company.


Michael Taylor, UW, "Specializing the Planet's Computation: ASIC Clouds"
Abstract:
As more and more services are built around the Cloud model, we see the emergence of planet-scale workloads (think Facebook's face recognition of uploaded pictures, or Apple's Siri voice recognition, or the IRS performing tax audits with neural nets) where datacenters are performing the same computation across many users. These scale-out workloads can easily leverage racks of ASIC servers containing arrays of chips that in turn connect arrays of replicated compute accelerators (RCAs) on an on-chip network. The large scale of these workloads creates the economical justification to pay the non-recurring engineering (NRE) costs of ASIC development and deployment. As a workload grows, the ASIC Cloud can be scaled in the datacenter by adding more ASIC servers.

Our research examines ASIC Clouds in the context of four key applications that show great potential for ASIC Clouds, including YouTube-style video transcoding, Bitcoin and Litecoin mining, and Deep Learning. We developed tools that consider all aspects of ASIC Cloud design in a bottom-up way, and methodologies that reveal how the designers of these novel systems can optimize TCO in real-world ASIC Clouds. Finally, we proposed a new rule that explains when it makes sense to design and deploy an ASIC Cloud, considering NRE.

Bio: I have been a professor in the Department of Computer Science and Engineering at the University of California, San Diego since 2005. I received a PhD in Electrical Engineering and Computer Science from MIT, and my research centers around computer architecture but spans the stack from VLSI to compilers. I was lead architect of the 16-core MIT Raw tiled multicore processor, one of the earliest multicore processors, which was commercialized into the Tilera TILE64 architecture. I co-authored the earliest published research on dark silicon, including a paper that derives the utilization wall that causes dark silicon, and a prototype massively specialized processor called GreenDroid. I also wrote a paper that establishes the definitive taxonomy, the Four Horsemen, for the semiconductor industry's approaches to dealing with the problem, and a follow-on paper on the Landscape of the Dark Silicon Design Regime. My research on dark silicon fed into the ITRS 2008 report that led Mike Mueller of ARM to coin the term "dark silicon". More recently, I wrote the first academic paper on Bitcoin mining chips.


Pankaj Mehra, Sandisk, "Evolutionary Changes, Revolutionary Implications: Persistent Memory in the Data Center"
Abstract: With byte-grain persistent memory once again imminent, it helps to consider the history and development of this idea over the past decade and a half. We look at developments at memory, controller, and filesystem level, but it is at the database and application level that the promise of this idea lies. I will show certain critical trends that clearly define the nature of the inflection we are about to witness and then outline the implications across a range of workloads and segments of the broader enterprise / data center ecosystem. The speculative portion of the talk will consider the possibility of memory disaggregating across future fabrics.

Bio: Pankaj has over 20 years of technical experience in developing and architecting scalable, intelligent information systems and services. At Western Digital, he is VP and Senior Fellow working closely with our customers to build accelerated solutions for data centers and applications, and is continuing to shape and evangelize memory and storage technologies.


Prior to joining Western Digital and SanDisk through acquisitions, Pankaj was SVP and Chief Technology Officer at Fusion-io, where he was named a top 50 CTO by ExecRank. He has also worked at Hewlett Packard, Compaq, and Tandem, and held academic and visiting positions at IBM, IIT Delhi, and UC Santa Cruz. He founded HP Labs Russia (2006), Whodini, Inc. (2010), and IntelliFabric, Inc. (2001), and is a contributing author to InfiniBand 1.0. He has held TPC-C and Terabyte Sort performance records, and his work has been recognized in awards from NASA and Sandia National Labs, among others. Pankaj was appointed Distinguished Technologist at Hewlett-Packard in 2004.


Pankaj received his Ph.D. in Computer Science from The University of Illinois at Urbana-Champaign.


Tom Anderson, UW, "High Performance RPC Packet Processing with FlexNIC"
Abstract: Application and service scale out is putting renewed stress on operating system packet handling. TCP and application-level RPC packet handling accounts for a large and increasing fraction of aggregate data center processing time. Techniques such as TCP segment offload and RDMA can help, but are insufficient for the small, frequent, wide-scale interactions typical of most data center client-server communication patterns. The FlexNIC model assumes hardware support for protocol-independent reconfigurable match-action tables and a limited amount of per-flow state. Given this, we show that we can provide scalable and efficient implementations of a wide variety of protocols, including TCP congestion control and per-core RPC steering. Using an emulation-based methodology, we show that FlexNIC can increase per-core packet throughput by more six times that of the Linux kernel TCP implementation.


Bio: Tom is the Warren Francis and Wilma Kolm Bradley Chair of Computer Science and Engineering at the University of Washington. His research interests span all aspects of building practical, robust, and efficient computer systems, including distributed systems, operating systems, computer networks, multiprocessors, and security. He is the winner of the USENIX Lifetime Achievement Award, the USENIX STUG Award, the IEEE Koji Kobayashi Computer and Communications Award, the ACM SIGOPS Mark Weiser Award, and the IEEE Communications Society William R. Bennett Prize. He is an ACM Fellow, past program chair of SIGCOMM and SOSP, and he has co-authored twenty-one award papers.