Cloud storage system for mass data
|School||Nanjing University of Technology and Engineering|
|Keywords||Huge amounts of data Cloud Computing Cloud Storage GlusterFS Nutch Hadoop Mahout Text Clustering|
With the Internet , mobile Internet and the development of the Internet of Things , the number of online users is increasing , the data also showed explosive growth , has come the era of massive data , especially in the Internet , telecommunications, finance and other industries , almost to the \itself the point . Faced with such vast amounts of data , the first plain : the size of this data has exceeded the load capacity of a single machine , how to build a large-scale , high- efficiency , easy to expand , highly reliable storage system is an urgent need to address the issues; followed in the information society , information is critical in the mass data , there is an important trend in the socialization of data , which is what we usually call unstructured data (for example : text , image , audio , video , etc.) , how to obtain useful information from the vast amounts of data , has also become a major challenges in recent years, the Internet . Based on the issues raised above , mass data storage and massive data mining research . Due to the performance of the network data in a variety of forms , in order to facilitate research , scientists in management literature , for example, the the mass data source specific electronic document data into the network . On this basis , through cloud storage and cloud computing platforms successfully build a cloud storage system for mass literature data , the system to achieve the the literature data management and analysis . System first requires the user to register, then the user can upload documents ( PDF files ) are stored in the cloud , then the user can to manage upload their own literature , such as the increase in the literature , delete the literature , the system also provides literature information retrieval and clustering analysis functions .