# Elements of Statistical Learning

## General Information

- This course will be offered online starting April 20th.
- For further information on how the lecture and exercises will be carried out, see below.

## Content

- After successful completion of the course students are able to understand and to apply basic concepts and methods of supervised and unsupervised statistical learning on large data (using R). They will have learnt fundamental concepts in statistical learning, with a focus on probabilistic formulation of the various learning problems and they will have an overview over different methods and their applications. Furthermore, they can adapt learning algorithms to new models and analyze new data with them.
- A selection of topics:
- statistical learning, supervised/unsupervised
- assessing model accuracy, bias-variance trade-off

- In the field of supervised learning we will study
- high-dimensional regression & shrinkage methods
- linear classification, logistic regression
- resampling: cross validation, bootstrap
- nonlinear classification, decision trees and random forests
- boosting

- In the field of unsupervised learning we will study
- dimension reduction, principal component analysis
- clustering, mixture models
- data analysis in R

## Lecture

The lectures will work as follows:

- There will be a weekly chat session on Tuesday 10:15. It will last at most until 11:45 depending on your questions. The lecturer will be online at least until 10:30 (even if no one attends/has questions).

For the moment this chat will be in Zoom. Please enter the Lecture Hall below. - Every week you are expected to read a certain part of the book "An Introduction to Statistical Learning" and watch the videos by the authors (available freely as a MOOC) related to this part.
- Short questions etc. can be discussed in the weekly chat. For longer questions it is preferable to use the forum of the lecture. It may be that questions raised in the forum will be answered in the online chat or in additional videos produced at most weekly.
- Questions related to a certain part of the book should be at the latest raised in the online chat at the end of the week in which this part of the book was the topic.
- Please discipline yourself and follow a weekly working routine as similar as possible to a normal semester.
- Any feedback is highly welcome! We also miss the personal contact with you! And we cannot see your happy or questioning faces to judge what was comprehensible and what not!

All materials will be available on Moodle.

## Exercise Class

The Exercise Class will work as follows:

- Every two weeks you are expected to solve an exercise sheet which will mostly contain exercises that have to be solved in R.
- There will be a biweekly chat session in Zoom (Lecture Hall) on Thursday at 9:00. During this chat session, you can ask questions regarding the solution of the exercise sheet. It will last at most until 10:00 depending on your questions. The exercise class teacher will be online until 9:15 (even if no one attends/has questions).
- The solutions to the exercise sheet will be uploaded the day before the chat session takes place, i.e. every second Wednesday.
- We highly recommend you to work through the R-labs that correspond to the chapters you covered in the Lecture and watch the videos by the authors related to this lab. You will find the labs in the book "An Introduction to Statistical Learning" as the second last Section in each Chapter.
- Short questions etc. can be discussed in the biweekly chat. For longer questions, it is preferable to use the forum of the Exercise Class. Questions raised in the forum will be answered in the online chat or in additional videos produced at most weekly.
- Questions related to a certain lab should be at the latest raised in the online chat at the end of the two weeks in which this lab of the book was the topic.

All materials will be available on Moodle.

## Literature

- The book the course is based on is available at https://ulm.ibs-bw.de/aDISWeb/app?service=direct/0/Home/$DirectLink&sp=SOPAC00&sp=SWI00002050&noRedir if you are in the university network (via VPN).
- There is an accompanying webpage https://www.statlearning.com/ where you can find additional materials.
- A MOOC (accompanying videos) by the authors of the book is available at https://www.dataschool.io/15-hours-of-expert-machine-learning-videos/ or https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/ .

## Time

- Lecture:
**Tuesday 10:15** - Exercises:
**Every Second Thursday 9:00**

## Exam

Form, time and details of the exam will be announced as soon as it is fully clear when and under which conditions exams can again be held. A further factor determining the form of the exam is the number of students who intend to take the exam.

If the number of students permits to do this efficiently (certainly the case if at most 20 students take the exam), we tend to do the exams orally (in a seminar room or maybe by video conferencing).

**ORAL EXAM DURATION: approx. 20min**

## Type

- Mathematics, B.Sc., Compulsory electives in Applied Mathematics
- Mathematical Biometry, B.Sc., Compulsory electives in Stochastics
- Mathematics and Management, B.Sc., Compulsory electives in Stochastics, Optimisation, Financial Mathematics
- Computational Science and Engineering, M.Sc, Compulsory elective

## Prerequisites

Foundations of Analysis and Lineare Algebra, solid basis in probabilty theory and statistics (as provided by the modules Elementary Probability theory and Statistics or Applied Stochastics 1 + 2)