DNN inference optimization across the system stack

Tuesday November 13, 2018
Location: Scaife Hall 214
Time: 3:00PM-4:30PM

Abstract

Recent breakthroughs in deep learning have made Deep Neural Network (DNN) models a key component of many AI applications ranging from speech recognition and translation, face recognition, and object/human detection and tracking, etc. These DNN models are very resource demanding in terms of computation cycles, memory footprint, power and energy consumption, etc. and are mostly trained and deployed in the cloud/datacenters. However, these is a growing demand on pushing the deployment of these AI applications from cloud to a wide variety of edge and IoT devices that are closer to data and information generation sources for reasons such as better user experience (latency and throughput sensitive apps), data privacy and security, limited/intermittent network bandwidth, etc. Compared to datacenters, these edge devices are very resource constrained and may not be even able to host these compute expensive DNN models. Great efforts have been made to optimize the serving/inference of these DNN models to enable their deployment on edge devices and to even reduce resource consumption/cost in datacenters.

We will talk about a few research and product work at Microsoft on optimizing DNN inference pipeline that touch upon hardware accelerator, compiler, model architecture, application requirements and system dynamics. We will discuss how these works optimize different layers of the DNN system stack. Moreover, we will show the importance of looking at the DNN system stack holistically in order to achieve better model performance and resource constraints tradeoffs.

Bio

Dr. Di Wang is currently a researcher of Microsoft Ambient Intelligent Team at Microsoft AI Perception and Mixed Reality. His research interests span the areas of computer systems, computer architecture, applied machine learning, VLSI design, energy-efficient systems design and sustainable computing. Specifically, he has applied his expertise on these topics to the areas of datacenters, IoT, storage systems, fault tolerant systems, EDA tools and recommendation systems. Di has authored over 30 publications in top conferences and journals and has received 4 best paper awards and 1 best paper nomination. His work has also been featured in the CACM news and was chosen as IEEE sustainable computing register’s pick of the month.

Di received his Ph.D. in Computer Science and Engineering from Penn State University in 2014, M.S. in Computer Systems Engineering from Technical University of Denmark (DTU) in 2008 and B.E. in Computer Science and Technology from Zhejiang University in 2005.