Project Brainwave: An AI Supercomputer for Real-time Inferencing at Cloud Scale, Eric S. Chung (Microsoft)

Thursday November 21, 2019
Location: Doherty Hall 1112
Time: 12:00PM-1:00PM

Abstract

The exponential growths of state-of-the-art DNNs (10x per year) – coupled with the diminishing returns of general-purpose architectures – have led to a Cambrian explosion of custom accelerators for AI. Beginning in 2016, Microsoft began deploying Project Brainwave, a 500 petaflop distributed supercomputer for real-time DNN inferencing capable of both low latency and high throughput without batching. The Brainwave Neural Processing Unit (NPU) runs on a Configurable Cloud architecture with FPGAs and leverages system-level optimizations (distributed on-chip model parallelism) and novel tensor data types co-designed with soft logic for very high serving efficiency. Brainwave's reconfigurability eliminates costly silicon changes to accommodate evolving state-of-the-art DNN models (e.g., CNNs, RNNs, Transformers) while still enabling orders of magnitude performance improvement over highly optimized software solutions. Today, Project Brainwave powers advanced features in Azure and the Bing search engine worldwide that otherwise would be undeployable on conventional hardware.

Bio

Dr. Eric Chung is a Principal Research Manager at Microsoft where he currently leads a highly interdisciplinary organization in Azure operating at the intersection of novel algorithms, systems, and hardware. From 2013 to 2019, Eric and team drove major strategic platforms into scale deployment including Project Catapult for Microsoft's hyper-scale accelerated infrastructure in Azure; and Project Brainwave, a planetary-scale AI inferencing supercomputer for Bing, Azure, and other cloud services. Eric received his PhD from Carnegie Mellon University in 2011 and has co-authored over 50 major patents and publications. In 2017, Eric received the IEEE TCCA Young Computer Architect Award in recognition of his research contributions in enabling FPGAs as first-class scalable and high-performance computational engines.