| Integration of module into courses of studies: |
Informatik, M.Sc., FSPO 2021 Technische und Systemnahe Informatik,
Informatik, M.Sc., FSPO 2021 Verteilte Systeme,
Medieninformatik, M.Sc., FSPO 2021 Technische und Systemnahe Informatik,
Medieninformatik, M.Sc., FSPO 2021 Verteilte Systeme,
Software Engineering, M.Sc., FSPO 2021 Technische und Systemnahe Informatik,
Software Engineering, M.Sc., FSPO 2021 Verteilte und Eingebettete Systeme,
Künstliche Intelligenz, M.Sc., FSPO 2021 Technische und Systemnahe Informatik,
Informatik, M.Sc., FSPO 2022 Technische Informatik,
Medieninformatik, M.Sc., FSPO 2022 Technische Informatik,
Software Engineering, M.Sc., FSPO 2022 Technische Informatik |
| Modes of learning and teaching: |
Fault-tolerant Distributed Systems (Vorlesung) (3 SWS),
Fault-tolerant Distributed Systems (Übung) (1 SWS) |
| Module authority: |
Prof. Dr.-Ing. Franz J. Hauck |
| Lecturer: |
Prof. Dr.-Ing. Franz J. Hauck, |
| Language: |
english |
| Turn / Duration: |
every summer term / 1 |
| Requirements (contentual): |
Fundamental knowledge of distributed systems, e.g. from the module "Grundlagen Verteilter Systeme" or equivalent modules |
| Requirements (formal): |
none |
| Basis for: |
Master's thesis in the area of distributed systems |
| Learning objectives: |
Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system. |
| Content: |
Terminology, System and Failure Models, Constraints, Architectural Considerations,Redundancy Concepts and Approaches, Failure Detection, Failure Recovery,Checkpointing, Event Sourcing: Case Studies, Practical Considerations,State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations,Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations,Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations,Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations,Testing, Failure Injection |
| Literature: |
G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.,P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.,Various articles provided during the lecture |
| Grading procedure: |
The module examination consists of a graded oral examination. If a specified academic work is achieved, a grade bonus is awarded in accordance with §17 (3a) of the General Examination Regulations at the immediately following examination. The examination grade is improved by one grade level, but not better than 1.0. An improvement from 5.0 to 4.0 is not possible. |
| Estimation of effort: |
Präsenzzeit: 60h
Vor- und Nachbereitung: 120h
Summe: 180h |