Fault-tolerant Distributed Systems - FTDS

Summer Semester 2019

Title:Fault-tolerant Distributed Systems
Type:Lecture with exercise, Module with only this course
Token / Number / Module number:FTDS / CS6922.000 / 74239
Semester hours / Credits:3L+1E / 6CP
Lecturer:Prof. Dr.-Ing. Franz J. Hauck
Tutor:Gerhard Habiger, Muntazir Mehdi
General schedule:Tuesday, 14:15 - 15:45, O27-2202; starting 23.04.2019
Thursday, 12:30 - 14.00, O27-121; starting 25.04.2019
Learning platform:For the course the e-learning system Moodle is used. Please register here with the password announced in the first class. Classes are going to be recorded and made available in Moodle.
Grade bonus:A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams:Oral exam by appointment with the lecturer.

Description and general information

Integration into courses of studies:

Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme

Course authority:Prof. Dr.-Ing. Franz J. Hauck
Language:English
Turn / Duration:every summer term / one semester
Requirements (contentual):Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules
Requirements (formal):-
Learning objectives:Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
Course assessment and exams:Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)
Grading:Grade of the oral exam
Estimation of effort:Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)

Summer Semester 2018

Title:Fault-tolerant Distributed Systems
Type:Lecture with exercise, Module with only this course
Token / Number / Module number:FTDS / CS6922.000 / 74239
Semester hours / Credits:3L+1E / 6CP
Lecturer:Prof. Dr.-Ing. Franz J. Hauck
Tutor:Gerhard Habiger, Muntazir Mehdi
General schedule:Tuesday, 14:15 - 15:45, O27-2202; starting 17.04.2018
Thursday, 12:30 - 14.00, O27-121; starting 19.04.2018
Learning platform:For the course the e-learning system Moodle is used. Please register here with the password announced in the first class. Classes are going to be recorded and made available in Moodle.
Grade bonus:A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams:Oral exam by appointment with the lecturer.

Description and general information

Integration into courses of studies:

Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme

Course authority:Prof. Dr.-Ing. Franz J. Hauck
Language:English
Turn / Duration:every summer term / one semester
Requirements (contentual):Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules
Requirements (formal):-
Learning objectives:Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
Course assessment and exams:Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)
Grading:Grade of the oral exam
Estimation of effort:Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)

Summer Semester 2017

Title:Fault-tolerant Distributed Systems
Type:Lecture with exercise, Module with only this course
Token / Number / Module number:FTDS / CS6922.000 / 74239
Semester hours / Credits:3L+1E / 6CP
Lecturer:Prof. Dr.-Ing. Franz J. Hauck, Dr. Jörg Domaschka
Tutor:David Mödinger, Gerhard Habiger, Eugen Frasch
General schedule:Tuesday, 14:15 - 15:45, O27-2202; starting 18.04.2017
Thursday, 12:30 - 14.00, O27-121; starting 20.04.2017
Learning platform:For the course the e-learning system Moodle is used. Please register here with the password announced in the first class. Classes are going to be recorded and made available in Moodle.
Grade bonus:A grade bonus of 0.3 resp. 0.4 is given if the lab is passed successfully. To pass the lab it is obligatory to attend the  exercises, to submit the excercise assignments (empty sheets do not count as delivery), and to present the own solution during the exercise within the semester term.
Exams:Oral exam by appointment with the lecturer.

Description and general information

Integration into courses of studies:

Informatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Informatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Informatik, Lehramt Staatsexamen: Wahl
Medieninformatik, M.Sc.: Kernfach Technische und Systemnahe Informatik
Medieninformatik, M.Sc.: Vertiefungsfach Verteilte Systeme
Software Engineering, M.Sc.: Kernfach Technische und Systemnahe Informatik
Software Engineering, M.Sc.: Vertiefungsfach Verteilte und Eingebettete Systeme

Course authority:Prof. Dr.-Ing. Franz J. Hauck
Language:English
Turn / Duration:every summer term / one semester
Requirements (contentual):Fundamental knowledge of distributed systeme, e.g. from the module Grundlagen Verteilter Systeme or equivalent modules
Requirements (formal):-
Learning objectives:Fault tolerance is a must for mission critical systems, but also convenient for all distributed software systems. In this module students will learn about multiple approaches to mask failures of applications based on standard hardware and networks, and by using special distributed algorithms. Students can describe and explain these approaches and can identify differences, especially various advantages and disadvantages between them. They are able to judge which approach and individual configuration is best suited for a given application scenario and failure model. Students understand the underlying mechanisms, e.g. consensus protocols, conflict-free replicated data types, checkpointing and state transfer  that different approaches are based on, including their constraints and requirements. With the presented case studies and the hands-on lab exercises, students recognize how these mechanisms can be combined into a running fault-tolerant system.
Content:
  • Terminology, System and Failure Models, Constraints, Architectural Considerations
  • Redundancy Concepts and Approaches, Failure Detection, Failure Recovery
  • Checkpointing, Event Sourcing: Case Studies, Practical Considerations
  • State Machine Replication: Consensus, Deterministic Execution, Deterministic Scheduling, Case Studies, Practical Considerations
  • Data-driven Replication: k-out-of-n Systems, DSM, Transactional Systems, Case Studies, Practical Considerations
  • Master-Slave Replication: Fault Detection, Update Strategies, Deterministic Execution, Case Studies, Practical Considerations
  • Eventual Consistency: Master-Master Replication, CRDT, Case Studies, Practical Considerations
  • Testing, Failure Injection
Literature:
  • G. Coulouris, J. Dollimore, T. Kindberg, G. Blair: Distributed systems. Concepts and design. 5th ed., Pearson, 2012.
  • P. Jalote: Fault tolerance in distributed systems. Prentice Hall, 1994.
  • Various articles provided during the lecture
Course assessment and exams:Oral exam; no course certificate; grade bonus if the lab is passed successfully (modalities will be announced at the beginning)
Grading:Grade of the oral exam
Estimation of effort:Active time (lecture, exercise, exam): 60h (2ECTS)
Self-study with post-processing of the lecture, exercise assignments, exam preparation: 120h (4ECTS)
Sum: 180h (6ECTS)