Last update: Feb 26, 2018

Daichi Mukunoki

Name: Daichi MUKUNOKI (椋木　大地)
LinkedIn
ResearchGate
Title and Affiliation:
- Postdoctoral Research Fellow, Graduate School of Science, Tokyo Woman's Christian University
- Visiting Researcher, Large-scale Parallel Numerical Computing Technology Research Team, Research Division, RIKEN Advanced Institute of Computational Sciences
- Visiting Researcher, Architecture Development Team, FLAGSHIP 2020 Project, RIKEN Advanced Institute of Computational Sciences
E-mail: daichi [dot] mukunoki [at] riken [dot] jp

Work experience

October 2017 - present: Postdoctoral Research Fellow at Graduate School of Science, Tokyo Woman's Christian University
October 2017 - present: Visiting Researcher at RIKEN AICS
June 2014 - September 2017: at RIKEN AICS
- April 2017 - September 2017: Postdoctoral Researcher at Architecture Development Team, FLAGSHIP FS2020 Project
- May 2015 - April 2017: Postdoctoral Researcher at Co-design Team, FLAGSHIP FS2020 Project
- June 2014 - September 2017: Postdoctoral Researcher at Large-scale Parallel Numerical Computing Technology Research Team, Research Division
December 2013 - May 2014, Research Fellow (PD) of the Japan Society for the Promotion of Science (at University of Tsukuba)
April 2013 - November 2013, Research Fellow (DC2) of the Japan Society for the Promotion of Science (at University of Tsukuba)

Education

April 2011 - November 2013, Graduate School of Systems and Information Engineering, University of Tsukuba (Doctor of Philosophy in Engineering, November 2013)
April 2009 - March 2011, Graduate School of Systems and Information Engineering, University of Tsukuba (Master of Engineering, March 2011)
April 2006 - March 2009, School of Library and Information Science, University of Tsukuba (Bachelor of Library and Information Science, March 2009)
April 2001 - March 2006, Gifu National College of Technology (Associate's degree in Engineering, March 2006)

Research interests

High performance computing (HPC), Parallel computing, Accelerator computing (especially GPGPU), Extended-precision floating-point arithmetic, Implementation and performance tuning of linear algebra kernels on many-core processors and parallel computers

Computer skills

C/C++, CUDA, MPI, OpenMP, Python, Java, Fortran, PHP, SQL, LaTeX, UNIX system and network administration

Working Groups

Information Processing Society of Japan (IPSJ), SIG High Performance Computing (HPC)
Auto-Tuning Research Group (ATRG)

Professional activities

Program Committee: The 13th International Workshop on Automatic Performance Tuning (iWAPT2018) (2018)
Program Committee: The 19th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC '18) (2018)
Program Committee: Special Session: Auto-Tuning for Multicore and GPU (ATMG2017) (2017)
Program Committee: The 2nd International Workshop on GPU Computing and Applications (GCA'17) (2017)
Program Committee: The 12th International Workshop on Automatic Performance Tuning (iWAPT2017) (2017)
Program Committee: The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC '17) (2017)
Program Committee: The First International Workshop on GPU Computing and Applications (GCA'16) (2016)
Program Committee: The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC '16) (2016)
Program Committee: The 16th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC '15) (2015)
Program Committee: The 15th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC '14) (2014)

Grants

April 2016 - March 2018: Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Young Scientists (B), #16K16062, "高性能・省電力な計算のための短尺浮動小数点表現の検討"
April 2013 - March 2015: Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for JSPS Fellows, #251290, "GPUスパコンのための3倍・4倍精度線形演算ライブラリの開発に関する研究"

Awards

PRACE-ISC Research Poster Award 2017, ISC High Performance 2017, 2017 (Daichi Mukunoki and Toshiyuki Imamura, "Implementation & Evaluation of 2.5D Matrix Multiplication on K Computer").
IPSJ Yamashita SIG Research Award, Information Processing Society of Japan, 2016（情報処理学会2016年度山下記念研究賞「NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法」）.
IPSJ Computer Science Research Award for Young Scientists, Information Processing Society of Japan, 2013（情報処理学会2013年度コンピュータサイエンス領域奨励賞「GPUにおける高速なCRS形式疎行列ベクトル積の実装」）.
IPSJ SIGARC Young Researcher Award, 194th IPSJ SIG-ARC, 2013（情報処理学会計算機アーキテクチャ研究会若手奨励賞「GPUにおける4倍精度演算を用いた疎行列反復解法の実装と評価」）.

Publications

Journal papers (with review)

椋木大地, 高橋大介: GPUにおける3倍・4倍精度浮動小数点演算の実現と性能評価, 情報処理学会論文誌コンピューティングシステム, Vol. 6, No. 1, pp. 66-77, 2013年1月.

Conference proceedings (with review)

Daichi Mukunoki and Toshiyuki Imamura: Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer, 12th International Conference on Parallel Processing and Applied Mathematics (PPAM2017), Sep. 2017 (accepted).
Toshiyuki Imamura, Daichi Mukunoki, Yusuke Hirota, Susumu Yamada, and Masahiko Machida: Design Towards Modern High Performance LA Library Enabling Heterogeneity and Flexible Data Formats, International Conference on Parallel Computing (ParCo2017), Sep. 2017 (accepted).
Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs, Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16). pp. 377-384, Sep. 2016.
Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs, Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015), pp. 642-650, Mar. 2015.
Daichi Mukunoki and Daisuke Takahashi: Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs, Proc. 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Part I, Workshop on Numerical Algorithms on Hybrid Architectures, Lecture Notes in Computer Science, Vol. 8384, pp. 632-642, Springer-Verlag, May. 2014.
Daichi Mukunoki and Daisuke Takahashi: Optimization of Sparse Matrix-vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs, Proc. 13th International Conference on Computational Science and Its Applications (ICCSA 2013), Part V, Lecture Notes in Computer Science, Vol. 7975, pp. 211-223, Springer-Verlag, Jun. 2013.
Daichi Mukunoki and Daisuke Takahashi: Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs, Proc. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-12), pp. 1378-1386, May. 2012.
Daichi Mukunoki and Daisuke Takahashi: Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs, Proc. 10th International Conference on Applied Parallel and Scientific Computing (PARA 2010), Part I, Lecture Notes in Computer Science, Vol. 7133, pp. 249-259, Springer-Verlag, 2012.
椋木大地, 高橋大介: GPUによる4倍・8倍精度BLASの実装と評価, 2011年ハイパフォーマンスコンピューティングと計算科学シンポジウムHPCS2011論文集, pp. 148-156, 2011年1月.

Poster presentations (with review)

Daichi Mukunoki and Toshiyuki Imamura: Implementation and Evaluation of 2.5D Matrix Multiplication on K Computer, ISC High Performance (ISC 2017), research poster session, Jun. 20, 2017.
Daichi Mukunoki and Toshiyuki Imamura: Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation, Proc. IEEE International Conference on Cluster Computing (Cluster 2016), pp. 144-145, Sep. 13, 2016 (extended abstract in conference proceedings).

Conference proceedings and technical reports (without review)

椋木大地, 今村俊幸: 京コンピュータにおける2.5次元アルゴリズムを用いた分散並列行列積の実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2017-HPC-159, No. 1, pp. 1-6, 2017年4月.
森倉悠介, 椋木大地, 深谷猛, 山中脩也, 大石進一: 大規模並列計算機における連立1次方程式の精度保証付き数値計算に対する性能評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2016-HPC-157, No. 1, pp. 1-7, 2016年12月.
今村俊幸, 椋木大地: コンシューマレンジGPUに最適化した固有値ソルバーの実装と評価, 情報処理学会研究報: ハイパフォーマンスコンピューティング, Vol. 2016-HPC-157, No. 7, pp. 1-9, 2016年12月.
椋木大地, 今村俊幸: 短尺浮動小数点形式の検討, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2015-HPC-152, No. 4, pp. 1-10, 2015年12月.
佐々木信一, 菱沼利彰, 藤井昭宏, 田中輝雄, 椋木大地, 今村俊幸: 京・FX10における倍々精度演算の高速化, 情報処理学会研究報告, Vol. 2015-HPC-151, No. 15, pp. 1-7, 2015年9月.
今村俊幸, 椋木大地, 山田進, 町田昌彦: SYMV・GEMVルーチン群のマルチGPU化とその評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2015-HPC-151, No. 13, pp. 1-8, 2015年9月.
佐々成正, 山田進, 町田昌彦, 椋木大地, 今村俊幸: FFTを使った時間発展問題における累積誤差, 応用数理学会2015年度年会講演論文集, 2015年9月.
椋木大地, 今村俊幸, 高橋大介: NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2015-HPC-150, No. 13, pp. 1-13, 2015年7月.
椋木大地, 今村俊幸, 高橋大介: NVIDIA GPUにおけるGEMVカーネルの自動チューニング, 計算工学講演会論文集, Vol. 20, E-2-1, 2015年6月.
今村俊幸, 椋木大地, 山田進, 町田昌彦: CUDA-BLAS等の選択による最速GPU固有値ソルバーの性能評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2015-HPC-148, No. 4, pp. 1-9, 2015年2月.
椋木大地, 今村俊幸: MaxwellアーキテクチャGPUにおける疑似倍精度演算を用いたDGEMMの実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2014-HPC-147, No. 26, pp. 1-6, 2014年12月.
今村俊幸, 椋木大地, 山田進, 町田昌彦: CUDA-xSYMVの実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2014-HPC-146, No. 14, pp. 1-12, 2014年10月.
椋木大地, 高橋大介: GPUにおける4倍精度浮動小数点演算を用いたクリロフ部分空間法の高速化, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2013-HPC-140, No. 35, pp. 1-7, 2013年7月.
椋木大地, 高橋大介: GPUにおける高速なCRS形式疎行列ベクトル積の実装, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2013-HPC-138, No. 5, pp. 1-7, 2013年2月.
椋木大地, 高橋大介: GPUにおける4倍精度演算を用いた疎行列反復解法の実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2012-HPC-137 (2012-ARC-202), No. 37, pp. 1-8, 2012年12月.
椋木大地, 高橋大介: GPUによる3倍精度浮動小数点演算の検討, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2011-HPC-132 (2011-ARC-197), No. 23, pp. 1-9, 2011年11月.
椋木大地, 高橋大介: GPUによる4倍精度BLASの実装と評価, 計算工学講演会論文集, Vol. 15, No. 2, pp. 891-894, 2010年5月.
椋木大地, 高橋大介: GPUによる4倍精度BLASの実装と評価, 情報処理学会研究報告: ハイパフォーマンスコンピューティング, Vol. 2009-HPC-123 (2009-ARC-186), No. 13, pp. 1-6, 2009年11月.

Poster presentations (without review)

荻田武史, 椋木大地, 尾崎克久: HPC分野における精度保証付き数値計算学の展開, 第3回CDMSI（ポスト「京」重点課題（７））シンポジウム, 2017年12月5日.
椋木大地, 今村俊幸, 高橋大介: PascalアーキテクチャGPUにおける線形計算カーネルの実装技術の検討, GTC Japan 2016, 2016年10月5日.
大井祥栄, 廣田悠輔, 椋木大地, 今村俊幸: KMATHLIB -High Performance and Scalable Numerical Library for the K Computer-, 応用数理学会2016年度年会, 2016年9月13日.
Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS, The 6th AICS International Symposium, Feb. 22, 2016.
Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, and Shin'ichi Oishi: Performance Evaluation of Verified Computation for Linear Systems on Parallel Computers, 2nd Annual Meeting on Advanced Computing System and Infrastructure (ACSI2016), Jan. 19, 2016.
大井祥栄, 廣田悠輔, 椋木大地, 今村俊幸: 京コンピュータ向け数値計算ライブラリ群KMATHLIBの実装, 応用数理学会2015年度年会, 2015年9月9日.
椋木大地, 今村俊幸, 高橋大介: GPUにおけるスレッド数自動選択機能を持ったメモリ律速な線形計算カーネル群「MUBLAS」の実装と評価, GTC Japan 2015, 2015年9月18日.
Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations, GPU Technology Conference (GTC 2015), Mar. 17, 2015.
椋木大地, 今村俊幸, 高橋大介: Kepler・MaxwellアーキテクチャGPUにおける性能が行列形状に依存しない高速なGEMVの実装, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集, 2015年1月26日 (extended abstract in conference proceedings).
佐々木信一, 藤井昭宏, 田中輝雄, 椋木大地, 今村俊幸: スーパコンピュータ京における倍々精度演算の高速化, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集, 2015年1月26日 (extended abstract in conference proceedings).
今村俊幸, 椋木大地, 佐々成正, 山田進, 町田昌彦: 疑似四倍精度拡張数学パッケージQP-Pack, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集, 2015年1月26日 (extended abstract in conference proceedings).
椋木大地, 今村俊幸, 高橋大介: KeplerアーキテクチャGPUにおける高速なSGEMVの実装, GTC Japan 2014, 2014年7月16日.
Daichi Mukunoki and Daisuke Takahashi: Linear Algebra Operations using. Quadruple-precision Arithmetic on GPU, GPU Technology Conference (GTC2014), Mar. 24, 2014.
Daichi Mukunoki and Daisuke Takahashi: Performance Comparison of Double, Triple and Quadruple Precision Real and Complex BLAS Subroutines on GPUs, Proc. ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (ATIP/A*CRC Workshop '12), pp. 788-790, May. 7, 2012 (extended abstract in conference proceedings).

Talk

椋木大地: 次世代計算機のための数値計算ライブラリの実装技術, 日本応用数理学会三部会連携「応用数理セミナー」, 早稲田大学西早稲田キャンパス, 東京都新宿区, 2017年12月26日.
椋木大地, 今村俊幸: Reduced-/Extended-precision BLASの実装方法の検討, Fifth Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2017), RIKEN AICS, 神戸市中央区, 2017年3月27日.
Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Implementation Techniques for High Performance BLAS Kernels on Modern GPUs, SIAM Conference on Computational Science and Engineering (CSE17), Hilton Atlanta, Atlanta, Feb. 28, 2017.
Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, and Shin’ichi Oishi: Performance Evaluation of Verified Computation for Linear Systems on Supercomputer, SIAM: East Asian Section Conference (EASIAM 2016), University of Macau, Macau, Jun. 20-22, 2016
Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Automatic Thread-Block Size Adjustment for Dense Matrix-Vector Multiplication on CUDA, 2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2016), Mathematics Research Center, National Taiwan University, Taipei, Feb. 19, 2016 (Invited).
椋木大地, 高橋大介: GPUにおける3倍精度演算と4倍精度疎行列反復解法, 第3回多倍長精度計算フォーラム, 工学院大学, 東京都新宿区, 2013年3月8日.
Daichi Mukunoki and Daisuke Takahashi: Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs, SIAM Conference on Computational Science and Engineering (CSE13), The Westin Boston Waterfront, Boston, Massachusetts, Feb. 28, 2013.
椋木大地, 高橋大介: GPUによる4倍精度行列計算, 2011年並列／分散／協調処理に関する『鹿児島』サマー・ワークショップ（SWoPP鹿児島2011） , かごしま県民交流センター, 鹿児島市, 2011年7月27日.

その他（非公開発表，学術発表以外など）

椋木大地: 良いプログラムで仕事を早く終わらせよう, 2016年11月10日産経新聞朝刊科学の中身（理化学研究所・関西編）, 2016年11月10日（新聞コラム）
椋木大地, 今村俊幸, 高橋大介: いま・これからのメニーコア向け線形計算カーネル実装技術, ATμWS2016, 2016年10月31日（非公開ワークショップ口頭発表）
椋木大地: 平成27年度自動チューニング研究会マイクロワークショップ（ATμWS）今村基盤B「O(1億)コア環境におけるスケーラブルな数値計算ソフトウェアの理論と応用」研究計画・椋木分担分, ATμWS2015, 2015年10月19日（非公開ワークショップ口頭発表）
椋木大地: ゲーム機からスパコンへ〜アクセラレータ技術とは？〜, 計算科学研究機構一般公開ミニ講演会, 2014年10月25日（講演）

MUBLAS

MUBLAS is an experimental implementation of BLAS kernels for NVIDIA GPUs. At present, this implementation includes AXPY, SCAL, GEMV, TRMV, and GEMM routines (but not fully implemented and not necessarily faster than CUBLAS and other existing implementations). The implementations of GEMV and TRMV have been discussed in [1][2]. All the routines support single, double, double-float (pseudo double), and double-double (pseudo quadruple) precisions for both real and complex operations. This program is open source software, but "as is".
[1] Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs, Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015), pp. 642-650 (2015).
[2] Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs, Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16). pp. 377-384 (2016).

MUBLAS version 1.5.38 mublas-1.5.38-release.tgz
MUBLAS version 1.5.31 mublas-1.5.31-release.tgz
MUBLAS version 1.5.24 mublas-1.5.24-release.tgz
MUBLAS version 1.5.14 mublas-1.5.14-release.tgz
MUBLAS version 1.4.28 mublas-1.4.28-release.tgz
MUBLAS-GEMV version 1.3.1 mublas-gemv-1.3.1-release.tgz
MUBLAS-GEMV version 1.3 mublas-gemv-1.3-release.tgz

Back