Dissertation > Industrial Technology > Automation technology,computer technology > Automated basic theory > Artificial intelligence theory > Expert systems, knowledge engineering

Classification Algorithm training based on the EP

Author WenZuoDi
Tutor FanMing
School Zhengzhou University
Course Computer Software and Theory
Keywords Machine learning data mining classification Emerging pattern
Type Master's thesis
Year 2006
Downloads 67
Quotes 2
Download Dissertation

Data Mining, also known as Knowledge Discovery in Database, refers to "mining" knowledge from data in very large databases in nontrivial methods. Classification, as an important theme in data mining, has been researched earlier in statistics, machine learning, neural network, expert systems, etc. But most algorithms are confined-memory, typically assuming a small data size. With the growth of data in volume and dimensionality, it is still a challenge to build effective classifiers for large databases.Methods for classification by Emerging patterns (EPs) were proposed in order to classify large datasets. EPs are new kind of knowledge pattern presented by G Dong and J. Li in 1999, which can capture the inherent distinctions between different classes of data. So EPs are useful for building accurate classifiers. CAEP, which was proposed by Li, Dong and Ramamohanarao in 1999, is the first EP-based classification algorithm. After that, a series of EP-based classifiers were proposed one after another such as BCEP, JEP-classifier, DeEPs, etc. It has shown that EP-based classifier is better than some classic classification methods such as decision trees.This dissertation proposes a novel EP-based classification method, called classification by emerging patterns with adjustable weights (CEPAW). In the training phase, CEPAW uses a special kind of EPs, called essential emerging patterns (eEPs), and aggregates differentiating powers of eEPs to construct classifiers. In order to aggregate differentiating powers of eEPs, each eEP is associated with an adjustable weight that is chosen by training. Training is divided into two phases. In the first phase, eEPs are mined and an initial classifier is constructed. Our algorithm is different from the previous EP-based algorithms with respect to the kind of EP and the scoring function. In the second phase, the weights of eEPs are adjustable by training. Firstly, weights of all eEPs are set equally. By using the initial classifier to classify the training samples iteratively, we adjust the weights of EPs according to the results of classifying until the accuracy rate can not be increased.In order to estimate the accuracy of our algorithms, our experiment study carried

Related Dissertations
More Dissertations