Rights Contact Login For More Details
- Wiley
More About This Title Building Dependable Distributed Systems
- English
English
The book contains eight chapters. The first chapter introduces the basic concepts and terminologies of dependable distributed computing, and also provide an overview of the primary means for achieving dependability. The second chapter describes in detail the checkpointing and logging mechanisms, which are the most commonly used means to achieve limited degree of fault tolerance. Such mechanisms also serve as the foundation for more sophisticated dependability solutions. Chapter three covers the works on recovery-oriented computing, which focus on the practical techniques that reduce the fault detection and recovery times for Internet-based applications. Chapter four outlines the replication techniques for data and service fault tolerance. This chapter also pays particular attention to optimistic replication and the CAP theorem. Chapter five explains a few seminal works on group communication systems. Chapter six introduces the distributed consensus problem and covers a number of Paxos family algorithms in depth. Chapter seven introduces the Byzantine generals problem and its latest solutions, including the seminal Practical Byzantine Fault Tolerance (PBFT) algorithm and a number of its derivatives. The final chapter covers the latest research results on application-aware Byzantine fault tolerance, which is an important step forward towards practical use of Byzantine fault tolerance techniques.
- English
English
Wenbing Zhao received his PhD in electrical and computer engineering from the University of California, Santa Barbara, in 2002. Currently, he is an Associate Professor in the Department of Electrical and Computer Engineering at Cleveland State University. Dr. Zhao has more than 80 academic publications to his credit, and three of his recent research papers in the area of dependable distributed computing have won best paper awards. Dr. Zhao also has a U.S. patent on consistent time service for fault-tolerant distributed systems.
- English
English
List of Tables xxi
Acknowledgements xxiii
Preface xxv
References xxviii
1 Introduction to Dependable Distributed Computing 1
1.1 Basic Concepts and Terminologies 2
1.2 Means to Achieve Dependability 9
References 13
2 Logging and Checkpointing 15
2.1 System Model 16
2.2 Checkpoint-Based Protocols 21
2.3 Log Based Protocols 34
References 54
3 Recovery-Oriented Computing 57
3.1 System Model 59
3.2 Fault Detection and Localization 62
3.3 Microreboot 83
3.4 Overcoming Operator Errors 87
References 93
4 Data and Service Replication 97
4.1 Service Replication 99
4.2 Data Replication 105
4.3 Optimistic Replication 111
4.4 CAP Theorem 131
References 138
5 Group Communication Systems 141
5.1 System Model 143
5.2 Sequencer Based Group Communication System 146
5.3 Sender Based Group Communication System 160
5.4 Vector Clock Based Group Communication System 186
References 191
6 Consensus and the Paxos Algorithms 193
6.1 The Consensus Problem
6.2 The Paxos Algorithm 196
6.3 Multi-Paxos 206
6.4 Dynamic Paxos 210
6.5 Fast Paxos 221
6.6 Implementations of the Paxos Family Algorithms 229
References 236
7 Byzantine Fault Tolerance 239
7.1 The Byzantine Generals Problem 240
7.2 Practical Byzantine Fault Tolerance 255
7.3 Fast Byzantine Agreement 271
7.4 Speculative Byzantine Fault Tolerance 271
References 284