Processor Research Team

Developing Parallel-computing Models and Acceleration Technologies for Large-scale High-performance Computing

To achieve high-performance computing with the K computer, we need to use more than 80,000 networked computing nodes in a way that they cooperate with each other using communication data. However, the overall performance may be degraded by the considerable overhead required for global communications and synchronization among the nodes. We are developing computing accelerators to achieve large-scale processing with less performance degradation by introducing a new parallel computing model based on a “Data-Flow” model with localized communication and synchronization. Also, we are developing data-flow accelerators where custom-computing circuits are automatically generated by a high-level synthesis compiler for each target application. Such specially customized hardware structures allow us to achieve high performance processing even for those applications which conventional CPUs are not good at handling. These research results are helping advance usage of the K computer, as well as aiding exploration of new computing models and new architectures for future supercomputers.

Research Content

Low-power and high-performance numerical computing using our own hardware compiler to generate custom-computing acceleration
As the advancement of semiconductor technology based on Moore’s Law slows down, it will be difficult to improve computing performance with multi-core microprocessors in the near future. One of the promising solutions to solve this problem is a reconfigurable custom computing machine, where software code of a target application is converted to run on customized accelerator hardware implemented and executed with field-programmable gate arrays (FPGAs).

To date, we have developed a high-level synthesis compiler to generate stream-computing hardware modules with a data-flow computing model, as well as a system to execute high-performance computing with the generated modules implemented on FPGAs. In the case of a tsunami simulation, for instance, we achieved two times higher sustained performance and an eight-fold improvement in power performance using FPGAs compared with GPUs. These improvements were achieved by employing efficient subsystem structures tailored to the target application, including customized memory subsystems and data-paths with increased pipelines. In addition, we have developed a real-time data-compression hardware module using multiple FPGAs to enhance memory and network bandwidth for high performance computing.

Going forward, we will further advance these developments, and also develop a new system to easily achieve high performance with massively-large-scale and complex computers. We aim to establish a new computing model and architecture for high-performance computing in the Post-Moore era.

What's New

15 Dec, 2017 Keynote Speech and Panelist at the 17th PC Cluster Symposium

Connections

FPGA Shell for HPC, Argonne National Laboratory
FPGA Overlay Architecture and HLS Compiler, Tohoku University and Nagasaki University

Team Leader
Kentaro Sano

Biography: Detail

Contact:
kentaro.sano[at]riken.jp
Team Website
Selected Publications

Annual Report: FY2014 RIKEN AICS Annual Report
(PDF 1.07MB); FY2013 RIKEN AICS Annual Report
(PDF 9.35MB); FY2012 RIKEN AICS Annual Report
(PDF 45.2MB); FY2011 RIKEN AICS Annual Report
(PDF 158KB)