Domain-Specific Overlays: Making FPGAs Software Programmable while Delivering High Execution Efficiency, Eriko Nurvitadhi (Intel)

Thursday November 14, 2019
Location: Doherty Hall 1112
Time: 12:00PM-1:30PM

Abstract

FPGA is a highly efficient general-purpose architecture due to its fine-grained spatial dataflow nature. However, such efficiency comes at a cost of programmability. While CPU and GPU are easier to program using high-level programming languages with fast compile times (e.g., C, OpenCL), FPGAs traditionally are programmable using lower-level hardware-oriented register-transfer language (e.g., Verilog) that requires hardware expertise and is compiled using long-running FPGA EDA tools.

Ideally, we want FPGAs to be just as software-programmable as CPU and GPU, while delivering even better efficiency. To this end, “overlays” have been proposed to make FPGAs easier to program, by implementing a software-programmable computer architecture on the FPGA “soft logic” fabric, enabling pure software flows (e.g., OpenCL programs for “soft“ GPU overlay, Python tensor programs for “soft” tensor processor overlay). Once an FPGA is programmed with an overlay bitstream, it can then be used by software developers to implement various applications with compile/debug/programming cycle at software speed, avoiding the need for long-running FPGA EDA tools in the loop. One may argue that an overlay architecture can also be implemented in “hard logic” ASIC for better efficiency (e.g., use a GPU instead of “soft” GPU on FPGA). Indeed, soft logic is less efficient when compared with an exact same architecture on a hard logic. Nevertheless, since FPGA development takes much less effort (time, resources, etc) than ASIC, one can afford to specialize FPGA overlay for the target application domain. Higher degree of specializations can lead to better improvements. As such, a specialized overlay on FPGA could be better in performance than a more general version of the architecture in hard logic. Hence, we believe domain-specific overlays can make FPGAs software programmable while delivering high efficiency.

This talk will start with background on FPGA trends and overlays. Then, two software programmable overlays specialized for AI will be presented. The first is a tensor processor (NPU) for applications in the AI domain, which offers extreme efficiency, performing better than a high-end GPU in real-time AI. The second is a soft GPU with specialized AI instructions (PDL-FGPU). While not as efficient as the NPU in AI, PDL-FGPU offer orders of magnitude better AI performance than a non-specialized baseline soft GPU, while still being able to run non-AI general data parallel workloads. Finally, the talk will conclude with a discussion on potential future research directions.

Bio

Dr. Eriko Nurvitadhi is a senior research scientist at Technology and Innovation office at Network and Custom Logic Group at Intel. He manages FPGA-related research at Intel, covering internal as well as external academic programs (e.g., HARP, ISRAs, CONIX). His research interest is in hardware accelerators (e.g., FPGAs, ASICs) for AI and data analytics, with over 30 academic publications and 20 patents filed/issued in this area. He has contributed to Intel’s FPGA and ASIC solutions for AI. He received his PhD in Electrical and Computer Engineering from Carnegie Mellon University.