This book is Volume 2 of a three-part series on active/active systems. It describes techniques that can be used today for extending system failure times from years to centuries, often at little or no additional cost. As our daily lives and corporate well-being become more dependent upon computers, system reliability grows increasingly important. No longer are frequent system outages acceptable. In many cases, failure intervals must now be measured in centuries. Starting with a summary of Volume 1, techniques for achieving extraordinary availabilities are reviewed. These techniques use active/active architectures, in which multiple independent nodes using a common distributed database are cooperating in a common application. Should a node fail, all that is required is to switch the users on that node to a surviving node. Equally important to the achievement of high availability is the ability to upgrade the system hardware and software without denying service to the users. The procedures to do this within an active/active system are described. The secret to high availability is to let it fail, but fix it fast. This volume explores the server, database, and network redundancy techniques that allow fast-fix to happen. The cost considerations involved in such redundant architectures are also explored.
This book is Volume 2 of a three-part series on active/active systems. It describes techniques that can be used today for extending system failure times from years to centuries, often at little or no additional cost. As our daily lives and corporate well-being become more dependent upon computers, system reliability grows increasingly important. No longer are frequent system outages acceptable. In many cases, failure intervals must now be measured in centuries. Starting with a summary of Volume 1, techniques for achieving extraordinary availabilities are reviewed. These techniques use active/active architectures, in which multiple independent nodes using a common distributed database are cooperating in a common application. Should a node fail, all that is required is to switch the users on that node to a surviving node. Equally important to the achievement of high availability is the ability to upgrade the system hardware and software without denying service to the users. The procedures to do this within an active/active system are described. The secret to high availability is to let it fail, but fix it fast. This volume explores the server, database, and network redundancy techniques that allow fast-fix to happen. The cost considerations involved in such redundant architectures are also explored.