Research on Multilevel Cache Technology of Mass Network Storage System
|School||South China University of Technology|
|Course||Applied Computer Technology|
|Keywords||Mass network storage system hierarchical architecture model cache and buffer locality strength page size file size access frequency management algorithm|
Since the value of data application has attracted more and more attention, the storage and management of mass data have been playing an increasingly important role in computer system performance development. People from academia, engineering and industry fields have made massive efforts and contributions for the theories, technologies and applications of storage architectures such as Direct Attached Storage (DAS), Network Attached Storage (NAS), Storage Area Network (SAN) and their derivatives.However, there were still two fundamental problems to be solved in the mass network storage system research field: the architecture design of storage network and the performance optimization of mass data access. Several defects could be found if we surveyed models and products mentioned above in the perspective of system architecture. These defects included: (1) the interfaces of different architecture levels were not clearly defined, making the implementation, compatibility and evolution costly; (2) the hierarchical optimizing methods were not adapted in the mass network storage environment. To deal with these defects, the research on hierarchical architecture and access performance optimization of mass network storage was essential both in theory and in practice.From the hierarchical architecture perspective, a series of principles were studied systematically and in detail in this thesis. These principles included: a hierarchical architecture of the mass storage system and its implementation, a quantitative way of evaluating the locality strength and its realization, the periodical pattern of page accesses and the application based on this pattern and page miss cost, the relationship between the static distribution layout of file sizes and the dynamic access pattern of data request in the network environment and the utilization of this relationship. The innovations based on the above research are as follows: (1) A Hierarchical Mass Network Storage Architecture (HMNSA) with five levels and a method of multilevel acceleration with cache between these levels were proposed. The five levels were Storage Application Layer, Storage Presentation Layer, Storage Connection Layer, Storage Network Layer and Storage Physical Layer. By the distinct interface between adjacent layers, various existing storage technologies could be integrated into a hybrid mass network storage system with compatibility. Based on this hierarchical mass network storage architecture, this thesis designed and implemented a prototype of a mass network storage system named as Intelligent Network Disk Storage System (INDSS). The feasibility and validity of this architecture were verified.Furthermore, the performance optimization approach was studied. Since traditional multilevel cache between the adjacent levels of CPU, internal storage and external storage had been proved efficient and effective for system performance, this paper extended this multilevel cache method to the adjacent layers of the above hierarchical mass network storage architecture. The network storage service chain - storage service clients, storage network and storage transaction servers - could be accelerated by these multiple cache.(2) A quantitative approach to the locality strength and its application in the Storage Presentation Layer were studied and a Locality Strength Algorithm (LSA) was proposed. Data in local memory was acquired from remote storage node. Some space in cache buffer would be allocated for the data to speed up their access. The allocation and occupation of this cached data of multi processes in the modern multi clients operating system was a key factor of run-time efficiency. This paper indicated that locality strength was an applicable index for the allocation. The locality strength indexes could be calculated by existing information. Based on these indexes, the LSA reduced the peak memory occupation of some process and decreased the frequency of paging thrashing in the Storage Presentation Layer.(3) The periodical page accesses pattern and their miss costs in the Storage Network Layer were studied and a Periodicity and Miss Cost (PMC) Algorithm was proposed. A data block request would be satisfied by storage network when it was missed in local memory. In our IND system, the distributed intelligent nodes in the storage network were capable of data caching. Our research on the caching management showed that lowering the average page miss cost was a valid alternative to increase the cache hit rate. The average miss cost was determined by various block latencies. Since numerous blocks will be accessed several times while the access costs of blocks are fluctuant, the PMC tries to keep the pages with high access costs in the cache as long as possible for future re-use to prevent repetitive expensive operations. Therefore, the average system response time was improved.(4) The distribution layout of file sizes and access frequencies, especially in the network environment, was analyzed and a Reallocation based on Distribution and Visit Frequency (RDVF) algorithm in the Storage Physical Layer was proposed.The necessity and feasibility of optimization for swap file, small size files and frequently visited files were studied. Combining these results with new devices technologies, a hybrid storage structure in the Storage Physical Layer was illustrated. Experiments and simulation results showed that the transfer rate was increased while the average response time was reduced by RDVF. The performance of mass network storage system was enhanced.