Skip to main content
Institute of distributed systems logo Ulm university logo

Fault-tolerant Distributed Systems

Summer Semester 2026

   
Title: Fault-tolerant Distributed Systems
German Title: Fehlertolerante Verteilte Systeme
Type: Lecture with Exercise, Module
Token / Number / Module number: FTDS / - / 74239
Semester hours / Credits: 4 SCH / 6 CP
Lecturer: Prof. Dr.-Ing. Franz J. Hauck
Tutor:
General schedule: Lecture: in presence and online at the same time (hybrid), recorded Dienstag 14:15 Uhr - 15:45, O28-H21, Beginn am 14.4.2026 Donnerstag 12:30 Uhr - 14:00, O27-2203 Lab/Exercise: in presence; irregular instead of lecture classes
Learning platform: For the course the e-learning system Moodle is used. Please subscribe to this course.
Grade bonus: A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is mandatory to attend the exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exam dates: Oral exam by appointment with the examiner

Description and general information

Integration of module into courses of studies: Informatik, M.Sc., FSPO 2021 Technische und Systemnahe Informatik, Informatik, M.Sc., FSPO 2021 Verteilte Systeme, Medieninformatik, M.Sc., FSPO 2021 Technische und Systemnahe Informatik, Medieninformatik, M.Sc., FSPO 2021 Verteilte Systeme, Software Engineering, M.Sc., FSPO 2021 Technische und Systemnahe Informatik, Software Engineering, M.Sc., FSPO 2021 Verteilte und Eingebettete Systeme, Künstliche Intelligenz, M.Sc., FSPO 2021 Technische und Systemnahe Informatik, Informatik, M.Sc., FSPO 2022 Technische Informatik, Medieninformatik, M.Sc., FSPO 2022 Technische Informatik, Software Engineering, M.Sc., FSPO 2022 Technische Informatik
Modes of learning and teaching: Fault-tolerant Distributed Systems (Vorlesung) (3 SWS), Fault-tolerant Distributed Systems (Übung) (1 SWS)
Module authority: Prof. Dr.-Ing. Franz J. Hauck
Lecturer: Prof. Dr.-Ing. Franz J. Hauck,
Language: english
Turn / Duration: every summer term / 1
Requirements (contentual): Fundamental knowledge of distributed systems, e.g. from the module "Grundlagen Verteilter Systeme" or equivalent modules
Requirements (formal): none
Basis for: Master's thesis in the area of distributed systems
Learning objectives: Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.
Content: Terminology, System and Failure Models, Constraints, Architectural Considerations,Redundancy Concepts and Approaches, Failure Detection, Failure Recovery,Checkpointing, Event Sourcing: Case Studies, Practical Considerations,State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations,Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations,Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations,Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations,Testing, Failure Injection
Literature: G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.,P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.,Various articles provided during the lecture
Grading procedure: The module examination consists of a graded oral examination. If a specified academic work is achieved, a grade bonus is awarded in accordance with §17 (3a) of the General Examination Regulations at the immediately following examination. The examination grade is improved by one grade level, but not better than 1.0. An improvement from 5.0 to 4.0 is not possible.
Estimation of effort: Präsenzzeit: 60h Vor- und Nachbereitung: 120h Summe: 180h
Logo: Certificate since 2008 - audit family-friendly university Logo: StudyCheck - top university Logo: StudyCheck - digital readiness Logo: Universities for openness, tolerance and against xenophobia