FPGA-Accelerated Deep Neural Network Inference in the Data Center with Microsoft's Project Brainwave, Gabe Weisz (Microsoft)

Wednesday, November 18, 2020
Location: Zoom
Time: 1:30PM-2:30PM

Abstract

Deep neural networks have replaced other AI techniques for interpreting images, text, and speech in many scenarios because they are extremely effective, but at a cost of being computationally expensive. At Microsoft, we've deployed Project Brainwave's Neural Processing unit (NPU) to FPGAs in data centers that are distributed around the globe and that compute millions of neural network inferences every second as part of Bing search. Project Brainwave's NPU is a highly parameterized, software-programmable overlay that is specialized for neural network computations, achieving high throughput and low latency computation on neural network models that are much more complex than those used in typical benchmarks.

In this talk, I'll show how neural network models can be broken down into a small set of distinct computational operators, and how we map these operators to the Brainwave's NPU. I'll also talk about how FPGAs are the perfect platform for the fast-changing world of deep neural networks, since their reconfigurability allows us to update our overlay in place to keep up with the state of the art and efficiently support neural network topologies that did not exist when the FPGAs were installed.

Bio

Gabriel Weisz is a Principal Hardware Engineer at Microsoft and works on Project Brainwave's model compiler, which maps neural network models to the Brainwave NPU. He received his Ph.D. in 2015 from Carnegie Mellon University's Computer Science Department under the supervision of Professor James C. Hoe, and received his B.S. degree from Cornell Engineering in 1999 with a double major in computer science and electrical engineering. He spent 11 years between undergrad and grad school as a co-founder and VP of technology at a healthcare software startup.