We develop high performance, highly productive software stacks that aim to simplify development of optimized, fault-tolerant supercomputer applications for computational science. Our focus includes large-scale data processing, heterogeneous computing, and fault tolerance systems. We are developing a MapReduce runtime that is highly optimized for the intra- and inter-node architectures of the K computer as well as its peta-scale hierarchical storage systems. Another of our projects focuses on increasing performance and productivity in large-scale heterogeneous systems. We are also investigating high performance graph analytics on the K computer. In particular, our work on the Graph500 benchmark has already allowed us to obtain the highest performance in the world using the K computer.
Recent Achievements
Simplifying Development of Applications for Supercomputers
As part of our research on application frameworks, we have developed a framework for computational fluid dynamics applications using the Adaptive Mesh Refinement (AMR) algorithm. AMR allows for significant reduction of both compute and memory requirements by adaptively managing mesh granularity of a given simulation domain; however, in reality, due to its complexity in dynamically changing mesh granularity, it has only been employed in a very limited set of relatively small-scale applications.
This problem is particularly challenging in heterogeneous systems with GPU accelerators since the overhead of data movement is much greater. To solve the problem, we have developed the Daino framework, which allows the user to develop high-performance AMR applications in a simple manner. The framework extends an existing standard compiler and aggressively applies automated program-transformation techniques for optimizing user applications for target architectures such as GPUs. As a result, an application developed using Daino is portable in both functionality and performance, as the automated translation of user code ensures that the application runs on different parallel systems efficiently without target-specific manual optimization. Our results demonstrated that an approach based on high-level frameworks such as Daino can achieve both high performance and high productivity even with complex large-scale systems such as supercomputers using GPU accelerators.
Overview of the Daino framework [Wahib2016]