Seminar: Neural Networks

Seminar Supervisor

Prof. Dr. Evgeny Spodarev

Seminar Advisor

Albert Rapp, M. Sc.

Date and Place

Depending on the number and preferences of the participants we will meet weekly or in blocks.  
Place: TBA.


The level of difficulty in this seminar is varying between the different topics. The audience is at least supposed to be familiar with basic probability, statistics, basic analysis and measure theory. We ensure the participants that most of the 'beyond' knowledge will be learned during the seminar.

Intended Audience

Bachelor and Master students in any mathematical programme of studies. 


The first three talks will be about classical statistical learning techniques and not immediately connected to neural networks (NN). This is perhaps somewhat surprising but we believe that it is important to make sure that every student understands general concepts before we move on to highly specialized NN. After all, there are just one out of many statistical learning methods. Therefore, the first three talks are intended to introduce concepts such as overfitting, loss functions, crossvalidation, etc. Be it NN or random forests or basic linear regression, these concepts are important in any kind of statistical learning setting. Also, we will cover stochastic gradient descent which is an important technique not only for NN.

To make things feel less abstract, the concepts will be explained through the lense of (generalized) linear models and random forests. While these models are, of course, not as glamorous as NN, they are better understood than NNs such that we can focus on learning these concepts in “mathematically safer” setting.

  • Talk 1: Machine Learning Terminology with Linear Regression (B)
  • Talk 2: Machine Learning Terminology with Random Forests (B/M)
  • Talk 3: Stochastic Gradient Descent for Generalized Linear Models (M)

The remaining talks of this course are closer to NNs. In fact, we begin this series of talks with the perhaps simplest NN, the so-called Perceptron. Then, the next talk considers shallow NNs. These are a natural extension from the Perceptron. Next, we take a look at the Kolmogorow-Arnold representation. While this is a somewhat old result, it has never really entered the NN literature because it was deemed useless for NNs. A somewhat recent paper questions this and we consider this exact paper in the sixth talk.

  • Talk 4: The Perceptron (B)
  • Talk 5: Shallow NN (B/M)
  • Talk 6: Kolmogorow-Arnold Representation Revisited (M)

The last three talks in our seminar deal with deep NN and their potential for overfitting (or lack thereof). We will consider two special NNs, namely ReLU NNs which use a so-called REctified Linear Unit activation function and convolution NNs. The latter type has been extensively used for image recognition in the literature. Finally, folklore has it that deep NNs do not overfit. This is in stark contrast to classical statistical learning theory and it is not clear whether this claim is actually true. Therefore, we likely won’t find a definite answer to this in our last talk but at least we can try to find out what the current literature knows about this surprising claim.

  • Talk 7: ReLU NN (B/M)
  • Talk 8: Convolutional NN (M)
  • Talk 9: Overfitting Revisited (M)


To register for the seminar, please write an E-Mail to Albert Rapp until March, 15. In the e-mail please give your name, matriculation number, your programme of studies and subjects you have taken in the area of Probability or Statistics. Also, please indicate what chapter of the book you're interested in.  

Criteria to pass the seminar

Each student is supposed to give a talk. Those who give a (good) talk together with written summary will pass the seminar. Talks will be held in German or English. A preliminary version of the Slides need to be submitted two weeks before each talk.


[1] Lecture notes on mathematics for deep neural networks by Schmidt-Hieber

[2] Elements of Machine Learning by Hastie et al (2017)

[3] The Kolmogorov-Arnold Representation revisited

[4] Johannes Schmidt-Hieber. “Nonparametric regression using deep neural networks with ReLU activation function.” Ann. Statist. 48 (4) 1875 - 1897, August 2020, Preprint version (should be up to date)

[5] Deep Learning: An Introduction for Applied Mathematicians by Higham and Higham, Preprint version

[6] Reconciling modern machine-learning practice and the classical bias–variance trade-off by Belkin et al.

[7] Modern Neural Networks Generalize on Small Data Sets by Olson et al.

[8] Benign overfitting in linear regression by Bartlett et al.

[9] Statistical analysis of stochastic gradient methods for generalized linear models by Toulis et al.

[10] Stochastic gradient descent methods for estimation with large data sets by Tran et al.

[11] An Introduction to Statistical Learning by James et al.

[12] Regression by Fahrmeir et al.

[13] A survey of the recent architectures of deep convolutional neural networks by Khan et al.

[14] Introduction to Convolutional Neural Networks by Wu



Seminar Supervisor

Prof. Dr. Evgeny Spodarev
Helmholtzstraße 18, Raum 1.65
Sprechzeiten: Nach Vereinbarung
E-Mail: Evgeny.Spodarev(at)

Seminar Advisor

Albert Rapp, M. Sc.
Helmholtzstraße 18, Raum 1.45
Sprechzeiten: Nach Vereinbarung
E-Mail: Albert.Rapp(at)


  • There will be an organizational meeting with all registiered participants after the registration deadline. Time and date TBA