Last update: Feb 26, 2018

Daichi Mukunoki

Work experience

Education

Research interests

Computer skills

Working Groups

Professional activities

Grants

Awards


Publications

Journal papers (with review)

Conference proceedings (with review)

Poster presentations (with review)

Conference proceedings and technical reports (without review)

Poster presentations (without review)

Talk

その他(非公開発表,学術発表以外など)


MUBLAS

MUBLAS is an experimental implementation of BLAS kernels for NVIDIA GPUs. At present, this implementation includes AXPY, SCAL, GEMV, TRMV, and GEMM routines (but not fully implemented and not necessarily faster than CUBLAS and other existing implementations). The implementations of GEMV and TRMV have been discussed in [1][2]. All the routines support single, double, double-float (pseudo double), and double-double (pseudo quadruple) precisions for both real and complex operations. This program is open source software, but "as is".
[1] Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs, Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015), pp. 642-650 (2015).
[2] Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs, Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16). pp. 377-384 (2016).


Back