Research on Key Issues of Deep Web Semantic Annotation Based on Ontology Learning
|Course||Management Science and Engineering|
|Keywords||Deep Web Ontology Querying Interface Schema Extraction SemanticAnnotation|
With the rapid development of the Internet and related Web technology, theinformation on the Web continues "deepening". A large amount of information hides in theonline databases that distribute everywhere in the Internet. Traditional search engines can’taccess to these contents and users can only get the returned results by submitting keywordsthrough the querying interface, so the information is called Deep Web. Deep Web hasbecome an important source of information for the users. In order to access to the DeepWeb information rapidly and accurately, Deep Web information integration has become aproblem to be solved urgently.Deep Web semantic annotation is an important step in result data processing module ofDeep Web information integration system, and Deep Web querying interface schemaextraction is the important basis of Deep Web semantic annotation. Therefore, this papertakes deep study and research to Deep Web querying interface schema extraction andresults semantic annotation. The ontology is introduced to the annotation process andinnovation solutions are put forward. A Deep Web search engine prototype system isdesigned and constructed based on the researches.The main researches of this paper are asfollows:(1)The Deep Web information integration framework and the domestic and foreignresearches on Deep Web semantic annotation are introduced in this paper. Theshortcomings and the insufficiencies of the traditional semantic annotation are analyzed.Briefly introduce the concept of ontology and the Deep Web domain ontology constructionprinciples and methods.(2)A Deep Web querying interface extraction method based on hierarchical model isproposed to solve the problem that the structure and semantic information is ignored in theexisting interface extraction methods. This method firstly mines the layout feature of the interface elements and uses spatial hierarchical clustering method to extract the schematree of the querying interface, then position and semantic relations of the labels andcontrols are used to match semantic label for each node of the schema tree.(3)A Deep Web semantic annotation method based on ontology is proposed to solvethe problems of insufficient annotation ability and disunity of annotation results. Firstly,align the data units into different groups according to the features of them and combinemultiple basic annotation methods to annotation the groups. Then establish the mappingbetween result mode and the domain ontology to get the integrated and unified annotationresults. Finally, cross annotation to different Deep Webs in the same domain is used toimprove the accuracy and rate of coverage.(4)Design and implement a domain oriented Deep Web search engine prototypesystem.Finally we design the experiment for the proposed methods with the the data setprovide by UIUC, and the experiment results verify that the the methods is feasible andeffective.