Research on Failure Detection and Data Migration Techniques in Intelligent Network Storage System
|School||South China University of Technology|
|Course||Computer System Architecture|
|Keywords||failure detection accrual failure detector data migration IntelligentNetwork Storage System INSS|
With the development of computer and communication technologies, the growing of datageneration and sharing is exponential. Storage technology is becoming one of the essentialcomponents of the IT infrastructure. Traditional centralized storage system has been unable tomeet the needs of large-scale storage applications and gradually replaced by distributedstorage systems. Distributed storage system capacity, I/O throughput and scalability definitehave the superior performance, but the reliability of its cheap component parts is unstable,which cause data loss with disastrous consequences. Therefore, the design of an efficientfault-tolerant mechanism for distributed storage systems is significant. This thesis is based onthe intelligent network storage system (INSS), proposed a fault-tolerant mechanism with twocore components: failure detection and data migration. The mechanism is designed to providequick and accurate detection of the node status and automatically transfer the data stored onthe suspected failed node before it crashed to ensure the data reliability and the businesscontinuity of the storage system.The main task of this paper is to design a new cumulative failure detector to improve theaccuracy of failure detection, shorten the detection time to meet the QoS requirements ofdifferent applications; launch the data migration module based on the output of the failuredetector to transfer the data from the suspected faulty nodes to the other ones, which canaccomplish the load balancing after relocation of storage. The main research work andachievements are as follows:(1) Established a fault-tolerant group of nodes that combined with level-based detectionmethods to develop a flexible, scalable, failure propagation model and alleviate the bottleneckproblem of the leader node.(2) According to the definition of the cumulative failure detector, designed a new failuredetector D-FD, and theoretically proved D-FD match the strong completeness and eventualstrong accuracy.(3) Established a data valuation model to help identify hot data and provide priormigration.(4) Established an association between fault detection and data migration, and designedthe timing selection algorithm to start the data migration module if the node/link is suspectedto failure.(5) Established a load-based data migration model and designed a data allocationalgorithm to guarantee the load balancing of each node after migration.