Seminar on Applied Data Science: Replication & Extension

Replicate top-tier papers. Develop extensions. Work in a reproducible manner.

In this seminar, you will reproduce a published empirical study (code and data), test it for robustness, and develop your own meaningful extension – using a clear research workflow and producing a result that you can present. You will learn how to integrate AI into your workflow effectively and document the process. You will also work in small, international teams on current topics. 

  • Portfolio output: Replication package + GitHub repository + presentation of results

  • Research skills: robustness checks, clear documentation, reproducible analyses

  • Practical & supervised: structured milestones and feedback

What is this seminar about?

Reproducibility is a core principle of good science – and, at the same time, one of the biggest practical hurdles in data-driven research. In this seminar, you will work in a research-oriented manner: you will replicate key findings from a published study, test their robustness and develop your own extension of the paper (e.g. new data, a new specification or methodological improvements). The end result will be a complete, transparent replication package. 

Registration (centralised): Please register via the seminar matching tool between 26 and 29 January 2026: Link to the tool
Tip: Set a high preference weighting if you are certain you wish to attend.

What you’ll actually be able to do by the end

After the seminar, you will not only be able to ‘carry out analyses’, but also to review, refine and document your research in a way that can be replicated

Reproduce & Debug

Execute third-party code, identify discrepancies, and document decisions thoroughly.

Robustness Checks

Systematically test specifications, subsamples, and sensitivityincluding plausibility checks.

Develop extensions

Add new data, time periods, or countries, or implement a methodological improvement.

Systematic workflow

Structured projects, version control with Git/GitHub, reproducible reports (Quarto/R Markdown).

Communication

Visualize, interpret, and present results clearly and persuasively.

Good Research Practice

Working transparently: transparent decision-making, clear documentation, respectful discussion.

Your final output (to show):

  • a well-organised GitHub repository (code, documentation, reproducibility)

  • a complete replication package (executable, traceable)

  • a presentation including a discussion of the robustness and its extension

Process & Milestones

The seminar is organised into five stages. Each stage ends with a clear milestone to ensure you make steady progress.

  1. Registration (central)

    Please register for the seminar via the seminar matching tool (26–29 January 2026): Link to the tool

  2. Kick-off & paper matching on 12 March 2026
    Tool setup (GitHub/Quarto), expectation management, topic & paper selection → Here is a list of possible papers

  3. Reproduction (baseline) by mid-May 2026
    Reproduction of the main results, documentation of deviations, clarification of missing details.

  4. Robustness + extension by mid-June 2026
    Systematic robustness checks and development of a proprietary extension (data or method).

  5. Finalisation & Presentation mid/late June 2026
    Present and discuss the replication and extension.

In this seminar, you will work in small, international teams with clear research questions and short feedback loops.

Important: Attendance at the kick-off and the presentation of results is essential!

What types of projects are possible?

You choose a research paper and decide early on what kind of extension you want to implement.

You will find an overview of suggested research papers in this list. You can also choose a research paper yourself:

Robustness Replication

Alternative specifications, placebos, sensitivity, subsamples

Data expansion

New data source, new time period, different country / different group

Methodological Expansion

Improved identification, additional checks, alternative models

Tools & Workflow

In this seminar, you will work with the latest tools and as part of a team – just as is standard practice in research teams and data science roles.

  • Primarily R, Python optional (for sub-tasks/extensions)

  • Git/GitHub (version control, teamwork, traceability)

  • Quarto / R Markdown (reproducible reports)

  • Docker & GitHub Actions (reproducible environments, automated runs)

  • LLM/coding assistants (as a tool – not a black box)

Examination performance & assessment

Assessment is based on what matters in the seminar: verifiable results, accurate reproduction and clear communication.

Assessment:

  • 40% Replication Package
    (short report: approx. 3–4 pages + code/output + reproducibility)

  • 50% Presentation
    (20 minutes + Q&A)

  • 10% Participation & Collaboration
    (peer feedback, constructive discussion)

What matters is not ‘perfection’, but transparency: What worked? Where were there deviations? What decisions were made?

Eligibility criteria – who is this suitable for?

Required:

  • Solid knowledge of R (or Python) (data preparation, regressions/models, visualisation)

  • Basic understanding of empirical research (interpretation of estimates)

  • Willingness to work with Git/GitHub (basic knowledge is sufficient)

Recommended:

  • Quarto/R Markdown or similar report workflows

  • Interest in causal inference / panel / DiD (depending on the paper)

✅ You’re in the right place if you can already carry out small analyses independently in R/Python.

⚠️ If you are just starting to learn R: please take a basic course or project course first.

FAQs

In this seminar, you will work in small teams (usually in pairs). Teamwork is realistic and helps to separate code, methods, and writing.

No. Many extensions use additional data sources, but they are not strictly necessary. What matters is that your project provides a clear, verifiable extension.

This is part of the learning objective: You will learn how to identify missing details, document decisions, and still arrive at a reproducible result in a transparent manner.

We primarily work in R. Python can be used for specific tasks or extensions, provided the workflow is documented in a way that allows it to be reproduced.