Session 2
Session 2
2.1 Project Talk: HPC libraries for solving dense symmetric eigenvalue problems ”Comparison of library eigensolvers for dense symmetric matrices on K-computer, JUQUEEN, and JURECA
Inge Gutheil (JSC)
In many applications for example in Density Functional Theory (DFT) used in physics, chemistry, and materials science the computation of eigenvalues and eigenvectors of dense symmetric matrices is an important issue. There are three modern libraries for the solution of this problem, EigenExa, ELPA, and Elemental. They behave different on different computer architectures and we will show which library should be preferred on the three different compters, K-computer, BlueGene/Q (JUQUEEN) and a cluster of Intel processors (JURECA).
2.2 Project Talk: Shared Infrastructure for Source Transformation Automatic Differentiation ”Handling Pointers and Dynamic Memory in Algorithmic Differentiation”
Sri Hari Krishna Narayanan (ANL)
Proper handling of pointers and the (de)allocation of dynamic memory in the context of an adjoint computation via source transformation has so far had no established solution that is both comprehensive and efficient. This talk gives a categorization of the memory references involving pointers to heap and stack memory along with principal options to recover addresses in the reverse sweep. The main contributions are a code analysis algorithm to determine which remedy applies, memory mapping algorithms for the general case where one cannot assume invariant absolute addresses and an algorithm for the handling of pointers upon restoring checkpoints that reuses the memory mapping approach for the reverse sweep.
2.3 Individual Talk: Exploring eigensolvers for large sparse non-Hermitian matrices
Hiroya Suno (RIKEN)
We are exploring ways for computing eigenvalues and eigenvectors of large sparse non-Hermitian matrices, such as those arising in Lattice Quantum Chromodynamics (lattice QCD) simulation. We have been exploring so far the Sakurai-Sugiura (SS) method, a method based on a contour integral, which allows us to compute desired eigenvalues located inside a given contour of the complex plane, as well as the associated eigenvectors. We have tested the SS method with large sparse matrices with the matrix order being up to about one billion, and have been able to compute eigenvalues for several simple cases with a certain accuracy. We are now ready to explore some other eigensolvers, such as ARPACK (Arnoldi Package) and ChASE (Chebyshev Accelerated Subspace iteration Eigensolver).
2.4 Individual Talk: Towards Automated Load Balancing via Spectrum Slicing for FEAST-like solvers
Jan Winkelmann (JSC)
Subspace iteration algorithms accelerated by rational filtering, such as FEAST, have recently re-emerged as a research topic in solving for interior eigenvalue problems. FEAST-like solvers are Rayleigh-Ritz solvers with rational filter functions, and as a result require re-orthogonalization on long vectors only in rare cases. Application of the filter functions, the computationally most expensive part, offers three levels of parallelism: 1) multiple spectral slices, 2) multiple linear system solves per slice, and 3) multiple right-hand sides per system solves. While the second and third level of parallelism are currently exploited, the first level is often difficult to efficiently realize. An efficient algorithmic procedure to load-balance multiple independent spectral slices is not yet available. Currently, existing solvers must rely on the user’s prior knowledge. An automatic procedure to split a user specific interval into multiple load-balanced slices would greatly improve the state of the art. We outline how, both the algorithmic selection of filter functions and the spectral slices, can be at the center of load-balancing issues. Additionally, we present the tools and heuristics developed in an effort to tackle the problems.
2.5 Individual Talk: Bidiagonalization with Parallel Tiled Algorithms
Julien Langou (INRIA)
”In a recent paper, we considered algorithms for going from a ””full”” matrix to a condensed ””band bidiagonal”” form using orthogonal transformations. We use the framework of ””algorithms by tiles””. We considered many reduction trees and obtained conclusive results on parallel distributed experiments on a cluster of multicore nodes. We will present these results. Based on these encouraging results, we will discuss the following five open problems. We believe the five open problems below are relevant to JLESC. (1) Applying the same techniques to symmetric tridiagonalization methods for the symmetric eigenvalue problem, (2) Impact (storage and computation time) on the computation of the singular vectors (and the eigenvectors in the symmetric eigenvalue problem case), (3) Performing experiments on very large scale machine to see the scalability of the methods, (4) Examining the trade-off between TS and TT kernels, (5) Experiment with scalable parallel distributed solution for going to band bidiagonal (or tridiagonal) to bidiagonal form. ”
2.6 Individual Talk: Targeting Interface Problems at Scale with Coupled Elliptic Solvers
Natalie Beams (UIUC)
The creation, adaptation, and maintenance of volume meshes that conform to problem geometry is costly and represents a scaling challenge. Furthermore, semi-structured meshes offer distinct computational advantages over fully unstructured meshes. We present a family of methods that permits the enforcement of a broad class of boundary and interface conditions on surfaces that do not coincide with element boundaries while maintaining high-order accuracy. Our methods leverage the advantages of finite element and integral equation methods in order to solve elliptic problems on a (potentially) structured volume mesh with embedded domains. A benefit of this approach is that standard, un-modified finite element basis functions can be used, in contrast to immersed finite element methods. Additionally, the computational mechanics for the integral equation portion of the coupled solution remain unchanged. One limiting factor in our methodology (and in many other simulations) is the dependence on a scalable fast multipole method. We discuss implications in a parallel setting and potential directions for collaborations in the Joint Lab.