RIKEN Center for Computational Science

Menu
Menu
Events/Documents イベント・広報

AICS at SC14 Supercomputing Conference (November 17-20, 2014) Booth #2431

SC14 Awards

RIKEN AICS will exhibit at SC14 which will be held in New Orleans, Louisiana in US.

SC14 Official Website.

Please stop by RIKEN AICS's booth(#2431) to look at K computer's usage situation, recent research highlights and an introduction to FLAGSHIP 2020 Project, the development of Post-K computer. We will also have a series of the short lectures. Check our program below and click for abstracts.

[News] Nov 20: K computer ranked #2 on HPCG!
[News] Nov 19: K computer ranked #2 on Graph 500 list!
[News] Nov 19: K computer recognized in Class 1 and 2 of the HPC Challenge Awards!

Press release: RIKEN, University of Tsukuba, Fujitsu

Short Lectures #2431
Tuesday, November 18th  2:30PM-5:00PM
2:30PM – Dr. Atsuya Uno (System Operation and Development Team, Operations and Computer Technologies Division, RIKEN AICS)
A Case Study of Power Estimation of Job Execution on the K computer
3:00PM – Dr. Haruyuki Kimura (Research Organization for Information Science and Technology (RIST), Kobe Center)
Introduction of HPCI Publication Database
3:30PM – Dr. ChungGang Li and Dr. Niclas Jansson (Complex Phenomena Unified Simulation Research Team, RIKEN AICS)
A Unified Framework for Large Scale Industrial Applications
4:00PM – Dr.Jaewoon Jung (Computational Biophysics Research Team, RIKEN AICS)
Development of GENESIS on K for Large Scale Molecular Dynamics Simulation
4:30PM – Dr. Hidetoshi Iwashita (Programming Environment Research Team, RIKEN AICS)
Coarray Features Contained in Parallel Language XcalableMP
Wednesday, November 19th  2:30PM-5:00PM
2:30PM – Mr. Koji Ueno (Tokyo Institute of Technology / HPC Programming Framework Research Team, RIKEN AICS)
#1 in Graph 500 (ISC'14)
Graph500 Challenge on K computer
3:00PM – Dr. Kohei Fujita (Computational Disaster Mitigation and Reduction Research Unit, RIKEN AICS)
GBP Finalist (SC14)
Current State of Earthquake Ground Motion Simulation on the K computer
3:30PM – Dr. Keigo Nitadori (Co-Design Team, Exascale Computing Project, RIKEN AICS)
GBP Finalist (SC14)
A Performance Comparison of MIC and GPU on a Hand-tuned N-body Code
4:00PM – Dr. Truong Vinh Truong Duy (Institute for Solid State Physics, The University of Tokyo)
OpenFFT: An Open-Source Package for 3-D FFTs with Minimal Volume of Communication
4:30PM – Dr. Kiyoshi Kumahata (Software Development Team, Operations and Computer Technologies Division, RIKEN AICS)
HPCG Performance Improvement on the K computer
Short Lectures #2431

 

Abstracts
Tuesday, November 18th

11/18 Tue. 2:30pm -

A Case Study of Power Estimation of Job Execution on the K computer
Dr. Atsuya Uno (System Operation and Development Team, Operations and Computer Technologies Division, RIKEN AICS)

The K computer is a distributed-memory parallel supercomputer system with 82,944 compute nodes. Like other large parallel computer systems, K computer consumes a large amount of electricity, so we have to control the power consumption of K computer not to exceed power upper limit. In this presentation, I will talk about the early study of power estimation of each job using thermal sensor data.


11/18 Tue. 3:00pm -

Introduction of HPCI Publication Database
Dr. Haruyuki Kimura (Research Organization for Information Science and Technology (RIST), Kobe Center)

HPCI Publication Database summarizes all kinds of publication information regarding the research projects of the innovative High Performance Computing Infrastructure (HPCI) with flagship K computer in an integrated manner. This publication database can be viewed and searched from the HPCI portal site. Registration of publication information can be done by users of the HPCI projects also through the HPCI portal site at any time. Such an integrated publication database is a quite unique feature of the HPCI system in Japan. In this talk, the basic concepts of HPCI Publication Database, characteristics of the application and the user interface, etc. are presented.


11/18 Tue. 3:30pm -

A Unified Framework for Large Scale Industrial Applications
Dr. ChungGang Li and Dr. Niclas Jansson (Complex Phenomena Unified Simulation Research Team, RIKEN AICS)

We present our work on developing a unified simulation framework that enables efficient computation of time resolved approximations for complex multiphysics industrial applications. All discretized equations are mapped onto the same grid and solved as a unified continuum. To address the challenges of modern and emerging supercomputers, efficient data structures and communication patterns are needed. Here, we use a hierarchical Cartesian grid together with special techniques to accurately capture complex industrial geometries, and efficient halo-exchange algorithms. The applicability of our framework is demonstrated by several large scale industrial applications, for which the necessary simulation time, including preprocessing steps, can be reduced to within a day instead of weeks.


11/18 Tue. 4:00pm -

Development of GENESIS on K for Large Scale Molecular Dynamics Simulation
Dr. Jaewoon Jung (Computational Biophysics Research Team, RIKEN AICS)

We have developed a new method for hybrid parallelization (MPI+OPENMP) in molecular dynamics (MD) program GENESIS using midpoint cell scheme and three dimensional decomposition scheme of FFT. As usual domain decomposition MD program, the global simulation space is divided to sub-domains for different MPI processors. Each sub-domain is again divided into cells, which are utilized as interaction cell pairs for non-bonded calculations.
The midpoint cell approach keeps the advantages of the original midpoint method Keeping the advantage of the midpoint method, our midpoint cell scheme has additional merits: (1) Suitable for shared memory parallelization, leading efficient hybrid parallelization, (2) Improvement of the locality by storing particle data cell-wise, and (3) Filtering out unnecessary calculations of midpoint checking procedure.
New three dimensional decomposition FFT scheme is useful in large scale molecular dynamics simulations because the communication of charge information is not necessary before/after FFT. The parallel performance of GENESIS is tested on K computer, showing scalability up to 32,768, 136,000 and 262,000 cores for systems of 1.3 million atoms, 11.7 million atoms, and 100 million atoms, respectively. One MD time step with long-range interaction could be carried out within 4 milliseconds even for 1 million atoms systems with particle mesh Ewald (PME) electrostatics.


11/18 Tue. 4:30pm -

Coarray Features Contained in Parallel Language XcalableMP
Dr. Hidetoshi Iwashita (Programming Environment Research Team, RIKEN AICS)


Wednesday, November 19th

11/19 Wed. 2:30pm -

Graph500 Challenge on K computer
Mr. Koji Ueno (Tokyo Institute of Technology / HPC Programming Framework Research Team, RIKEN AICS)
#1 in Graph 500 (ISC'14)

Graph500 is a benchmark for ranking supercomputers based on the performance of graph processing. Large scale graph processing becomes important in many fields such as social network analysis, electronic design automation and protein-protein interaction analysis. The Graph500 benchmark performs distributed Breadth-first search (BFS) on large graphs which has billion vertices or even trillion vertices. K computer is ranked 1st in the Graph500 June 2014 List. This lecture describes how we achieved No.1 in the Graph500 List using K computer. The development of efficient large scale distributed BFS is presented.


11/19 Wed. 3:00pm -

Current State of Earthquake Ground Motion Simulation on the K computer
Dr. Kohei Fujita (Computational Disaster Mitigation and Reduction Research Unit, RIKEN AICS)
GBP Finalist (SC14)

The computation of earthquake response of shallow ground is a challenging problem in terms of capability computing, due to its large domain size with complex geometry and nonlinear material properties. In this talk, I will focus on the development of GAMERA, a fast and scalable ground motion simulation program based on nonlinear FEM with unstructured mesh. By running GAMERA on 294,912 CPU cores of the K computer, we analyzed a 10.7 Billion Degrees-Of-Freedom problem with 30,000 time steps targeted on a 2 km x 2 km block of Tokyo.


11/19 Wed. 3:30pm -

A Performance Comparison of MIC and GPU on a Hand-tuned N-body Code
Dr. Keigo Nitadori (Co-Design Team, Exascale Computing Project, RIKEN AICS)
GBP Finalist (SC14)

A performance comparison of various architecture in a craft made N-body code is carried out. The benchmark code employs 4th-, 6-th, and 8-th order Hermite integrator in full double precision arithmetics and was initially developed for intel AVX intrinsics and OpenMP directive. Later, we ported it for HPC-ACE of Fujitsu K/FX10, Haswell AVX2 (just recompiling), intel MIC KNC in nitive mode, and finally GeForce Titan in CUDA with explicit host/device data transfers. In our best effort tuning, roughly about the half of the peak performance of each architecture was achieved. A serious performance decay in small N is observed in the MIC version which is due to an OpenMP synchronization overhead that takes ~30µs for just opening and closing a parallel scope.


11/19 Wed. 4:00pm -

OpenFFT: An Open-Source Package for 3-D FFTs with Minimal Volume of Communication
Dr. Truong Vinh Truong Duy (Institute for Solid State Physics, The University of Tokyo)

The fast Fourier transform (FFT) is an essential primitive in various fields of science and engineering. We have developed an open-source package for 3-D FFTs with minimal volume of communication called OpenFFT. This is achieved by a decomposition method possessing two distinguishing features: adaptive decomposition and transpose order awareness. In the proposed method, the FFT data is decomposed based on a row-wise basis that maps the multi-dimensional data into one-dimensional data, and translates the corresponding coordinates from multi-dimensions into one-dimension so that the one-dimensional data can be divided and allocated equally to the processes using a block distribution. As a result and different from previous works that have the dimensions of decomposition pre-defined, our method can adaptively decompose the FFT data on the lowest possible dimensions depending on the number of processes. In addition, this row-wise decomposition provides plenty of alternatives in data transpose, and different transpose order results in different volumes of communication. We identify the best transpose orders with the smallest volumes of communication for the 3-D, 4-D, and 5-D FFTs by analyzing all possible cases. OpenFFT has been developed for 3-D FFTs based on our method using the 2-D domain decomposition. Numerical results show good performance and scaling property of OpenFFT in comparison with other parallel packages.


11/19 Wed. 4:30pm -

HPCG Performance Improvement on the K computer
Dr. Kiyoshi Kumahata (Software Development Team, Operations and Computer Technologies Division, RIKEN AICS)

HPCG is the novel benchmark program for supercomputer. It measures performance to solve a sparse linear system equation by the preconditioned conjugate gradient (PCG) method. Similar to actual applications, HPCG performance relies on balance between computation, memory bandwidth and communication. We have been evaluating and improving HPCG performance on the K computer. As a result, our HPCG score was ranked 2nd on the list at ISC14. And its efficiency to the peak performance was more than 4% and most effective. In this lecture, I introduce our latest improvement way for HPCG and its result on the K computer.