An implementation of a fault-tolerance system with dual redundant servers

number: 
2360
English
Degree: 
Author: 
Tabarak Dhia'a Abdul-Hussein
Supervisor: 
Dr. Mumtaz Mohammed Ali
year: 
2009

Fault-tolerance technology has become an important branch in computer systems, and is widely used in many other fields. Fault-tolerance schemes can be used to increase the Availability and Reliability of network service systems. This thesis implemented a Fault-Tolerance Web server system that supports software and hardware failure bypassing, depending on the Dual-Server Hot-Standby principle, where an efficient transparent fail-over scheme is used to provide fault-tolerant network service. The goal of the implemented system is to build a fault-tolerant real-time system for the Internet Information Services (IIS 7.0) Web server. The system is developed using Visual C#.NET 2008 programming language, under Windows Vista operating system. Service replication is used to attain fault-tolerance function through offering backup services; represented by a Hot-Standby Backup Server. In service replication, a primary server is in active state and a backup (secondary) server is in inactive state (waiting) at the same time, and there is a heartbeat exchange between the two ends. In normal condition, Primary server sends the status to the secondary periodically. When the primary appears not expected behavior such as primary IIS server halted or the link failed, the secondary server will take over the identity of primary and recover normal running of the whole system. The primary server is responsible for monitoring the IIS status, sending and receiving heartbeat messages and control signals communication with the secondary. In turn, the secondary server is responsible for checking the IIS application status and the primary server status, listening to the primary (on/off) messages, heartbeat signaling and monitoring functions, ready to take over the Web hosting service whenever it is needed. As for recovery after a failure, the Backward Error Recovery method is used to restore state and resume service. Finally, for monitoring and configuration purposes, the Monitor and Configuration modules were designed for these two purposes, making the system easy to administrate and configure. In addition to all these, IIS modules take other responsibilities such as, creating sites, logging requests, queue the available websites and monitor the IIS status. Each one of these functions is performed by its corresponding module.