# Data Analysis: Description, Inference and Causality

# News and current issues

- The second exam will probably take place on Oktober 11 starting at 14:00.
- Inspection of the first exam takes place on August 31 from 10:00 until 11:00 in HeHo 18 room 1.20.

# Calendar

# General information

This course will be held in English and is open for the following master students only (7 LP): M.Sc. Wirtschaftswissenschaften, M.Sc. Wirtschaftsmathematik, M.Sc. Wirtschaftsphysik, M.Sc. Wirtschaftschemie, M.Sc. Finance. For more information please have a look at the module descriptions.

This course consists of a weekly lecture and a weekly exercise. In the exercises, students have to solve problem sets involving theoretical questions and programming exercises in **Stata** and they will replicate the main results of published papers.

You need to have handed in and correctly solved a minimum number of problem sets to be admitted to the exam. The grade will be based on an exam at the end of the term.

# Dates and responsible persons

Tuesday, 10:00-12:00, in N24/131

Thursday, 14:00-16:00, in N24/131

Lecture(2 SWS): Prof. Dr. Georg Gebhardt

Tutorial(2 SWS): Frederik Collin

# Content

Typically the student of econometrics is presented with a bewildering range of statistical tools and methods, which differ in the assumptions that are made, in their properties and in what we learn with them from data. The reasoning of choices made, often remains unexplained (why does OLS focus on the conditional mean and not on the median or the whole conditional distribution) or is related in a rule of thumb fashion to the type of data (fixed or random effects estimators for panel data). But often real world data do not neatly fall in one of the categories and even if they do, many approaches remain to choose from. In this course we review a broad range of approaches and try to shed some light on how applied researchers choose specifications and estimation methods. We do this by considering the following three categories:

Description: Often we do not want to learn everything we could from data. For example, we do not consider the whole distribution, but only (conditional) means. We do this because we want to reduce complexity, helping us to answer the question we are interested in. Typically, applied research treats descriptive statistics separately from inference, but in reality complexity reduction plays a major role when choosing an approach.

Inference: Statistical inference concerns itself with the problems that arise from the fact that we have only a finite sample available to address the questions we are interested in. We consider the properties of different estimation methods from non parametric kernel methods over ordinary least squares to maximum likelihood methods. We will learn that we can make inferences with less data (more efficiently and potentially less biased) if we are either able and willing to make additional assumptions (e.g. regarding the distribution of the population) or willing to learn less (e.g. only the conditional mean and not the whole conditional distribution). We will look at applied examples of how researchers have dealt with these trade-offs. Special consideration will be given to situations in which we do not have enough data to easily learn what we want to know (small sample problems).

Causality: Often (but not always) we want to make causal statements, e.g. that the road is wet because of rain and not the other way around. It will turn out that we have to make assumptions in addition to the assumptions needed for inference and in this case a large sample is of no help. However different assumptions are needed for different techniques. We will consider the most important techniques (e.g. control variables, instrumental variables, differences in differences and fixed effects), their assumptions and examples of applications.

# Schedule

- Descriptive Statistics
- Estimators and Estimation with One Outcome Variable
- Multivariate Statistical Inference and Regression Analysis
- Methods of Causal Inference

# Material

All course material and documents will be provided via the Moodle learning platform. The password for joining the course will be announced in the first lecture.

# Literature

- Angrist, Joshua D. and Jörn-Steffen Pischke (2014) Mastering `Metrics: The Path from Cause to Effect, Princeton University Press, Princeton, NJ
- Baum, Christopher F. (2006) An Introduction to Modern Econometrics Using Stata, Stata Press
- Casella, George und Roger L. Berger (2008) Statistical Inference, Duxbury, Cengage Learning: International edition of 2nd revised edition
- Goldberger, Arthur S. (1991) A Course in Econometrics, Harvard University Press, Cambridge, MA
- Wooldridge, Jeffrey M. (2010) Econometric Analysis of Cross Section and Panel Data, The MIT Press, Cambridge, MA