Research of Data De-duplication Techniques in Intelligent Network Storage System
|School||South China University of Technology|
|Course||Computer System Architecture|
|Keywords||Intelligent Network Storage System Data De-duplication Metadata Locality Performance|
In the21st century, the continuous development of data-centric information servicestechnology such as digital libraries, e-commerce, scientific computing，electronic documentand multimedia information storage have brought enormous technical requirements to storagecapacity and reliability in data storage system. From the present viewpoint, using data backsystem to backup critical data is an effective technology to improve data reliability. However,along with the rapid growth of data, data back up system contains more and more redundantdata as time goes by, and faces huge storage pressure in data backup. Therefore, the study ofdata reduction techniques to remove redundant data in storage system has important practicalsignificance in reducing the waste of resources caused by problems that relate to storage space,management and energy consumption, etc.The existing data backup technologies, including incremental backup, and differentialbackup, can hardly solve the problem of rapid expansion of backup data, As a new datareduction technology, data de-duplication technology reduces the cost in storage system byremoving redundant data to reduce storage capacity used by storage system. Base on thisreason, on the basis of in-depth analysis and study of the data de-duplication technology, datade-duplication technology is introduced into Intelligent Network Storage System (INSS) thatproposed by our research group. The problem of low storage performance caused by massivemetadata management and frequent disk I/Os, resulting in the induction of data de-duplicationtechnology, would be analyzed, and optimization schemes are proposed in removingredundant data and reducing storage cost to improve storage performance of INSS.The main task of this thesis is focused on researching and designing data de-duplicationmodule in INSS, and improving the performance of data de-duplication technology. The mainresearch work and achievements are as follows:(1) Designed and implemented an object-based memory allocation mechanism to reducememory fragmentation and improve the efficiency of memory allocation.(2) Proposed a paralleled and hierarchical scheme to handle service logic processing indata de-duplication module with thread pool and finite state machine technology.(3) Designed an index structure which is base on hash table with eliminative mechanism, to insert, retrieve and delete metadata efficiently in memory.(4) Improved storage container allocation algorithm based on stream-informed segmentlayout (SISL) technology. This algorithm can reduce the cost of storage container metadata,improve the utilization of storage container, and maintain locality of accessing blockmetadata.