Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer applications > Information processing (information processing) > Text Processing

Research on Key Issues of Constructing the Web of Linked Data

Author ZhangXiaoHui
Tutor ZuoRuiHua
School Beijing University of Technology
Course Applied Computer Technology
Keywords linked data coreference resolution cloud computing similarity markovlogic network
CLC TP391.1
Type PhD thesis
Year 2013
Downloads 2
Quotes 0
Download Dissertation

Linked data is a method to publish and share data in Internet based on semantictechnology. Semantic web not only expresses the data on the Internet in amachine-understandable way, but also makes links between the data to construct ahuge web of data with rich links. The web of data enables people to obtain knowledgeand information form Internet more intelligently and refinedly. The biggest differencebetween the web of data and the traditional internet are the object being linked and thetype of links. In the web of data, the object being linked changes from HTMLdocument to the URI referring to a specific thing, and the hypertext links also turninto typed RDF links containing explicit semantics. During the information sharing intraditional Internet, the granularity is always too coarse and the semantics of data aremissed. Linked data can be a good solution to the above problems, and will promotethe traditional Internet linking documents evolving into the web of data.Although there are already web of data being constructed by multiple areas basedon linked data technology, the in-depth development and application of linked data isstill faced with many problems and bottlenecks. First, the lack of data sources leads tothe slow growth of data scale. Second, the phenomenon of object coreferencewidespread in the heterogeneous semantic datasets hinders the automated building ofrich links between datasets. In this paper, the research work is focused on theapplication model of linked data and the technology of coreference resolution.The main work and research results include:(1)By introducing cloud computing into the building of the web of data, anapplication model of linked data based on cloud computing is proposed and thearchitecture of cloud based linked data platform is designed as the support of theinnovative model. Cloud based linked data platform supplies a variety of servicesneeded by the sharing of linked data to effectively reduce the technical threshold forordinary data owner sharing data based on linked data and support the building oflinked data sharing community across the Internet.(2)The study of the method of coreference resolution based on similarity modelindicates that the traditional methods have deficiencies in the computing of propertyweight and the processing of multi-valued properties. A new weight calculationmethod based on the distribution characteristics of property values described by Renyi entropy is proposed, and the similarity calculation method between the values ofmulti-valued properties is improved. Through the experiment based on the opensource RDF datasets, the advantage of the method presented by this chapter is proved.(3)A coreference resolution method based on Markov logic network is proposed. Theconversion model from the schema of semantic data to Markov Logic Network andthe corresponding ground method are designed. In addition, there are some datasetscan not be used directly to construct the ground Markov logic network because of thelarge-scale. This paper presented an optimized method of pre-match to narrow thematching range. The experiment show that Markov based method can perform betterwhen processing the dataset containing rich semantic constraints.(4)By studying the elastic telescopic mechanism of resource in cloud computingenvironment, an elastic coreference resolution system for cloud computingenvironment is designed based on the methods proposed by the above two chapters.The system can select automatically the appropriate method for coreference resolutionaccording to the characteristics of the dataset. The jobs in the system are optimizedbased on parallel mechanism to make full use of the computing resources. Thedynamic resource scheduling model is designed based on the mechanism of dynamiccluster and buffer pool. In addition, the corresponding elastic stretch strategy ofresource and job scheduling algorithm are also presented. Finally, the system isdeployed based on OpenStack which is an opensource management software forcloud computing, and the performance of the system is validated through some tests.

Related Dissertations
More Dissertations