Research and Implementation on Disaster-recovery Oriented Failure Detection Algorithm
|School||National University of Defense Science and Technology|
|Course||Computer Science and Technology|
|Keywords||Disaster Recovery Information system Failure detection High Availability|
With the rapid development of information systems and network technology , information systems and networks gradually become a cornerstone of the work of key sectors . Centralized storage of data and large-scale use of the network suffered the invasion of natural or man-made disasters , the loss is often irreparable , how to resist the invasion of the disaster is to ensure the continued operation of the business to become one of the focus of attention . Failure detection technology is one of the key technologies of the disaster recovery system , fast, efficient and accurate detection of failure is the premise and guarantee of effective disaster recovery . Failure detection technology for WAN disaster recovery system first analyzes a model of distributed systems , distributed systems fault model , failure detection model and the evaluation criteria of failure detection algorithm , followed by an analysis of high availability field and grid areas failure detection algorithm , and failure detection algorithm in the case of multi-node domain knowledge . This work is mainly reflected in the following aspects : ( 1 ) In-depth analysis of a failure detection model for disaster recovery , and put forward a number of issues need to be addressed in the model , called HB-DR (Heartbeat is designed to address these issues for DisasterRecovery) failure detection algorithm . The algorithm on a regular basis to detect each other , according to the node and network status output ρ value of a representative system availability , and ρ values ??and failure threshold comparison to determine the availability of the system . Usability testing can be used for disaster recovery model system . The simulation tests show that the algorithm can solve the information problem of delayed or lost accuracy decreases . ( 2 ) According to the characteristics of the disaster recovery system requirements, and scale the HB-DR algorithm itself proposed the design of a large-scale failure detection system , the method to improve the scalability of the system design uses the neighbor chain model combined group notice reduce the load on the system . (3 ) by HB-DR algorithm integrated into the Linux-HA software , the a failure detection simulation environment . Detection and management of resources for disaster recovery in the simulation environment , and the algorithm was tested .