Research of Text Categorization on Food Complaint Documentation Based on Ontology
|School||Northeast Normal University|
|Course||Applied Computer Technology|
|Keywords||Incremental window Similaritycomputing Domain ontology Text classification|
Along with the computer technology and network technology rapiddevelopment,the information on the Internet is increasing exponentially. Textinformation is one of the most important part of it. How can we get the usefulinformation from the mass text information? The problem has always been animportant problem in information processing. As everyone knows, text classificationtechnologyis the important basis in the information retrieval and text mining fields. Ina given category labels set,it can determine text classification according to the textcontent. Text classification has become a key technology with very large practicalvalue, it has been the effective means in organizing and managing data.As a kind of knowledge representation model, ontology can provide richsemantic knowledge, the relationship between its internal concepts can supportinference mechanism. At the same time, as a concept set in the domain, ontology canprovide good category tags, which may solve the inconvenience of the collection of the trainingset due to the excessive category tags. In this paper, with the help of experts in food industry,through market research, a dairy ontology is built manually with an ontology construction tooldeveloped by Stanford University, Protégé3.4.2. At the same time, the paper proposes animproved Core Window-based Model similarity computing method, namely, incremental windowsimilaritycomputing method.This paper proposes an incremental window similarity computing method combined withontology. In the paper, the classification is realized by changing the width of the windowdynamically in the light of category tags provided by domain ontology, which avoids the influenceof the window’s length on the similarity value. Through a set of experiments, the incrementalwindow similarity computing method is proved to be superior to the other methods that work byavoiding the influence of the window’s length on the similarity value. The method in this paper,the traditional tf-idf, and Core Window-based Model similarity computing method is combinedwith the ontology to form the classifiers relatively. It is concluded that the incremental windowsimilarity computing method has improved considerably in classification precision, recall, andF1_measure.