Fault-tolerant Distributed Systems - FTDS

 
Title: Fault-tolerant Distributed Systems
Type: Lecture with exercise, Lecture mainly implemented as online videos, Module with only this course
Token / Number / Module number: FTDS / CS6922.000 / 74239
Semester hours / Credits: 3L+1E / 6CP
Lecturer: Prof. Dr.-Ing. Franz J. Hauck
Tutor: Prof. Dr.-Ing. Franz J. Hauck
General schedule: Lecture Classes:
in presence and online at the same time (hybrid), recorded
Tuesday 14:15h - 15:45h, O28-H21, Starting April 16, 2024
Thursday 12:30h - 14:00h, O27-2203
Lab Classes:
in presence and online at the same time (hybrid)
irregular instead of lecture classes
Learning platform: For the course the e-learning system Moodle is used. Please register here.
Grade bonus: A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams: Oral exam by appointment with the lecturer.

Description and general information

 
Integration into courses of studies:

Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme
Computational Science and Engineering, M.Sc.: Compulsory elective module

 
Course authority: Prof. Dr.-Ing. Franz J. Hauck  
Language: English  
Turn / Duration: every summer term / one semester  
Requirements (contentual): Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules  
Requirements (formal): -  
Learning objectives: Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.  
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
 
Course assessment and exams: Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)  
Grading: Grade of the oral exam  
Estimation of effort: Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)
 
 

 
Title: Fault-tolerant Distributed Systems
Type: Lecture with exercise, Lecture mainly implemented as online videos, Module with only this course
Token / Number / Module number: FTDS / CS6922.000 / 74239
Semester hours / Credits: 3L+1E / 6CP
Lecturer: Prof. Dr.-Ing. Franz J. Hauck
Tutor: Prof. Dr.-Ing. Franz J. Hauck
General schedule:

Lecture Classes:
in presence and online at the same time (hybrid), recorded
Tuesday 14:15h - 15:45h, O28-H21, Starting April 18, 2023
Thursday 12:30h - 14:00h, O27-2203
Lab Classes:
in presence and online at the same time (hybrid)
irregular instead of lecture classes

Learning platform: For the course the e-learning system Moodle is used. Please register here.
Grade bonus: A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams: Oral exam by appointment with the lecturer.

Description and general information

 
Integration into courses of studies:

Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme
Computational Science and Engineering, M.Sc.: Compulsory elective module

 
Course authority: Prof. Dr.-Ing. Franz J. Hauck  
Language: English  
Turn / Duration: every summer term / one semester  
Requirements (contentual): Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules  
Requirements (formal): -  
Learning objectives: Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.  
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
 
Course assessment and exams: Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)  
Grading: Grade of the oral exam  
Estimation of effort: Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)
 
 

 
Title: Fault-tolerant Distributed Systems
Type: Lecture with exercise, Lecture mainly implemented as online videos, Module with only this course
Token / Number / Module number: FTDS / CS6922.000 / 74239
Semester hours / Credits: 3L+1E / 6CP
Lecturer: Prof. Dr.-Ing. Franz J. Hauck
Tutor: Prof. Dr.-Ing. Franz J. Hauck
General schedule: Lecture Classes:
in presence and online at the same time (hybrid), recorded
Tuesday 14:15h - 15:45h, O25-H7, Starting April 19, 2022
Thursday 12:30h - 14:00h, O27-2203
Lab Classes:
in presence and online at the same time (hybrid)
irregular instead of lecture classes
Learning platform: For the course the e-learning system Moodle is used. Please register here.
Grade bonus: A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams: Oral exam by appointment with the lecturer.

Description and general information

 
Integration into courses of studies: Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme
Computational Science and Engineering, M.Sc.: Compulsory elective module
 
Course authority: Prof. Dr.-Ing. Franz J. Hauck  
Language: English  
Turn / Duration: every summer term / one semester  
Requirements (contentual): Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules  
Requirements (formal): -  
Learning objectives: Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.  
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
 
Course assessment and exams: Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)  
Grading: Grade of the oral exam  
Estimation of effort: Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)
 
 

This lecture is on schedule for Summer Semester 2020 despite the ongoin Corona crisis. Lecture classes are provided online and can be consumed at any time. They are, however, synchronised with Labs. Lab classes take place at fixed times during the semester. If you are interested in participating, please subscribe to the corresponding Moodle course in order to get further information.

 
Title: Fault-tolerant Distributed Systems
Type: Lecture with exercise, Lecture mainly implemented as online videos, Module with only this course
Token / Number / Module number: FTDS / CS6922.000 / 74239
Semester hours / Credits: 3L+1E / 6CP
Lecturer: Prof. Dr.-Ing. Franz J. Hauck
Tutor: Gerhard Habiger, Muntazir Mehdi
General schedule: Lecture Classes:
online; starting on 20.04.2020
Lab Classes:
Thursday, 12.30 Uhr - 14.00 Uhr, online; starting on 30.04.2020
Learning platform: For the course the e-learning system Moodle is used. Please register here.
Grade bonus: A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams: Oral exam by appointment with the lecturer.

Description and general information

 
Integration into courses of studies:

Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme
Computational Science and Engineering, M.Sc.: Compulsory elective module

 
Course authority: Prof. Dr.-Ing. Franz J. Hauck  
Language: English  
Turn / Duration: every summer term / one semester  
Requirements (contentual): Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules  
Requirements (formal): -  
Learning objectives: Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.  
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
 
Course assessment and exams: Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)  
Grading: Grade of the oral exam  
Estimation of effort: Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)
 
 

 
Title: Fault-tolerant Distributed Systems
Type: Lecture with exercise, Module with only this course
Token / Number / Module number: FTDS / CS6922.000 / 74239
Semester hours / Credits: 3L+1E / 6CP
Lecturer: Prof. Dr.-Ing. Franz J. Hauck
Tutor: Gerhard Habiger, Muntazir Mehdi
General schedule: Regular Classes:
Tuesday, 14.15 Uhr - 15.45 Uhr, O27-2202; starting on 23.04.2019
Thursday, 12.30 Uhr - 14.00 Uhr, O27-121; starting on 25.04.2019
Learning platform: For the course the e-learning system Moodle is used. Please subscribe here with the password announced in the first class. Classes are going to be recorded and made available in Moodle.
Grade bonus: A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams: Oral exam by appointment with the lecturer.

Description and general information

 
Integration into courses of studies:

Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme

 
Course authority: Prof. Dr.-Ing. Franz J. Hauck  
Language: English  
Turn / Duration: every summer term / one semester  
Requirements (contentual): Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules  
Requirements (formal): -  
Learning objectives: Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.  
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
 
Course assessment and exams: Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)  
Grading: Grade of the oral exam  
Estimation of effort: Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)
 
 

Title:Fault-tolerant Distributed Systems
Type:Lecture with exercise, Module with only this course
Token / Number / Module number:FTDS / CS6922.000 / 74239
Semester hours / Credits:3L+1E / 6CP
Lecturer:Prof. Dr.-Ing. Franz J. Hauck
Tutor:Gerhard Habiger, Muntazir Mehdi
General schedule:Tuesday, 14:15 - 15:45, O27-2202; starting 17.04.2018
Thursday, 12:30 - 14.00, O27-121; starting 19.04.2018
Learning platform:For the course the e-learning system Moodle is used. Please register here with the password announced in the first class. Classes are going to be recorded and made available in Moodle.
Grade bonus:A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams:Oral exam by appointment with the lecturer.

Description and general information

Integration into courses of studies:

Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme

Course authority:Prof. Dr.-Ing. Franz J. Hauck
Language:English
Turn / Duration:every summer term / one semester
Requirements (contentual):Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules
Requirements (formal):-
Learning objectives:Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
Course assessment and exams:Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)
Grading:Grade of the oral exam
Estimation of effort:Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)

Title:Fault-tolerant Distributed Systems
Type:Lecture with exercise, Module with only this course
Token / Number / Module number:FTDS / CS6922.000 / 74239
Semester hours / Credits:3L+1E / 6CP
Lecturer:Prof. Dr.-Ing. Franz J. Hauck, Dr. Jörg Domaschka
Tutor:David Mödinger, Gerhard Habiger, Eugen Frasch
General schedule:Tuesday, 14:15 - 15:45, O27-2202; starting 18.04.2017
Thursday, 12:30 - 14.00, O27-121; starting 20.04.2017
Learning platform:For the course the e-learning system Moodle is used. Please register here with the password announced in the first class. Classes are going to be recorded and made available in Moodle.
Grade bonus:A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams:Oral exam by appointment with the lecturer.

Description and general information

Integration into courses of studies:

Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme

Course authority:Prof. Dr.-Ing. Franz J. Hauck
Language:English
Turn / Duration:every summer term / one semester
Requirements (contentual):Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules
Requirements (formal):-
Learning objectives:Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
Course assessment and exams:Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)
Grading:Grade of the oral exam
Estimation of effort:Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)