High Performance Computing I

This lecture provides an introduction to parallel computer architectures and standard programming interfaces for parallel numerical algorithms. We focus on few numerical algorithms like dense matrix multiplications and LU decompositions which allow us to approach step by step the theoretical peak performance.


  • Introduction to the programming language C++ with a special focus on numerical linear algebra
  • Memory hierarchies
  • Parallel computer architectures
    • shared memory with POSIX threads and OpenMP
    • distributed systems with MPI
    • GPUs


  • Monday, 2 pm to 6 pm, Helmholtzstraße 18, room E.20, and in the labs Helmholtzstraße 18, room E.44, and O27/211.
  • Friday, 2 pm to 4 pm, Helmholtzstraße 22, room E.04, and in the labs Helmholtzstraße 18, room E.44, and O27/211. Please note that Friday sessions are open-ended, i.e. we provide support in the labs until 6 pm.
  • The first lecture will be on 15 October 2018, 2 pm.


Linear algebra, calculus, numerical linear algebra, and programming are required. We expect some basic knowledge of C. Knowledge of C++ is not required as we provide an introduction to C++. Introduction to High Performance Computing is recommended but not strictly required. Consider, however, that probably considerable more time will be required for the assignments of the lab sessions without the associated introductory lecture of the bachelor program.


The exams will be held orally in March and April 2019. An oral exam takes about 45 minutes. You are free to chose English or German as exam language. An active and successful participation in the labs (at least half of the quizzes are passed) and an registration for the exam at HIS is required. Following days are offered for oral exams:

  • 21 March 2019 (no slots left)
  • 26 March 2019 (no slots left)
  • 9 April 2019 (no slots left)
  • 11 April 2019 (two slots in the afternoon left)

To register for one of the free slots (available before noon and at afternoon) send an email to Andreas F. Borchert.


Resources and examples of the lectures will be posted here:

  • 15 October 2018: First steps with vectors in C: lecture notes, session
  • 19 October 2018: First steps with matrices in C: lecture notes, session
  • 22 October 2018: Selected BLAS level 1 functions, benchmarks and gnuplot: session
  • 26 October 2018: Simple cache optimizations: session
  • 29 October 2018: Simple cache optimizations for GEMM: session
  • 2 November 2018: Cache optimizations for GEMV: session
  • 5 November 2018: First steps with C++: session
  • 9 November 2018: C++ tools for managing memory buffers: slides, session
  • 12 November 2018: Packing matrix blocks for an efficient GEMM implementation: slides, session
  • 16 November 2018: GEMM micro kernel, GEMM macro kernel, GEMM frame routine: session
  • 19 November 2018: Generic classes, template functions, and static polymorphism: session
  • 23 November 2018: Function objects and lambda expressions: session
  • 26 November 2018: Unblocked LU factorization: session
  • 30 November 2018: More on vector and matrix classes: session
  • 3 December 2018: First steps with threads in C++: session
  • 7 December 2018: Mutex and condition variables: session
  • 10 December 2018: Thread pools (part one): session
  • 14 December 2018: Thread pools (part two): session
  • 17 December 2018: GEMM with AVX-optimized micro kernels: session
  • 21 December 2018: Another unblocked LU factorization, blocked LU factorization: session
  • 7 January 2019: Using MKL-BLAS for LU factorization, improved blocked LU factorization (divide and conquer): session
  • 11 January 2019: Introduction to OpenMP: session
  • 14 January 2019: Introduction to MPI: session
  • 18 January 2019: Transfer of vectors and matrices using MPI: session
  • 21 January 2019: Scatter and gather operations, asynchronous communication, two-dimensional grids: session
  • 25 January 2019: Distributed matrices: session
  • 28 January 2019: Distributed GEMM: session
  • 1 February 2019: Introduction to CUDA: slides, session
  • 4 February 2019: Virtual vs. physical GPU architecture, matrices: session
  • 8 February 2019: Global synchronization and two-dimensional aggregation: session (session extended on 12 February 2019)
  • 11 February 2019: Multigrid solver (part one): session
  • 15 February 2019: Multigrid solver (part two): session

Lab sessions

Lab sessions will be held on two times per week. Each session comes with an online guide that summarizes the preceding lecture and provides practical exercises.

A registration at SLC for HPC I is necessary to participate in the lab sessions.

Some of the lab sessions come with a quiz which are due a week later:


126 October 20182 November 2018PDF
25 November 201812 November 2018PDF
316 November 201823 November 2018PDF
47 December 201814 December 2018PDF
511 January 201918 January 2019PDF


Dr. Andreas F. Borchert
Helmholtzstr. 20
Room 1.23

Dr. Michael Lehn
Helmholtzstr. 20
Room 1.09

M. Sc. Constantin Greif
Helmholtzstr. 20
Room 1.28