Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems

Data mining technology and classification algorithm

Author LiuGang
Tutor GuoJinGeng
School PLA Information Engineering University
Course Computer Software and Theory
Keywords Data mining KDD (Knowledge Discovery in Databases) SJEP-based classifiers knowledge patterns
CLC TP311.13
Type PhD thesis
Year 2004
Downloads 3357
Quotes 16
Download Dissertation

Data mining is a technique that aims to analyze and understand large source data and reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. Why there have been more attentions to it from researchers or businessmen is due to the wide availability of huge amounts of data and imminent needs for turning such data into valuable information. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Data mining involves an integration of techniques from database, artificial intelligence, machine learning, statistics, knowledge engineering, object-oriented method, information retrieval, high-performance computing and visualization. Essentially, data mining is high-level analysis technology and it has a strong purpose for business profiting. Unlike OLTP applications, data mining should provide in-depth data analysis and the supports for business decisions. Like the other new techniques, however, data mining must develop gradually from concept creation, accepted importance, wide discussion, few usage attempts to a large applications. Most experts consider it as the phase of wide discussion today. It still needs theoretic studies and algorithm exploring. Though some results have been achieved, more theoretic problems are kept in ongoing researches. In addition, data mining is from real applications and must combine with the specific business application logic to solve the specific problem. This is because that different business fields have different mining needs and targets. The successful data mining systems are the excellent combination of data mining techniques and the business logic, rather than tools that are designed to make data mining application development convenient.A data rich but information poor situation makes for the emergency of data mining and within a few years, many people in different fields were interested in data mining. Classification, as an important field in data mining, has been researched earlier in statistics, machine learning, nerve net and expert systems. But most algorithms are memory resident, typically assuming a small data size. With the growth of data volume and dimensionality, it’s a challenge to build an efficient classifier for large databases.Jumping emerging patterns (JEPs), a new kind of knowledge patterns, were recently proposed to capture some crucial difference between a pair of datasets and some JEP-based classifiers were built. Previous studies show that those JEP-based classifiers have good overall predictive accuracy and are scalable on data volume and dimensionality.But they suffer from the large number of mined JEPs, which makes the classifiers complex. In this paper, we propose a special type of JEPs, the most significant jumpingemerging patterns (SJEPs), which are believed to have strong discriminating power and are sufficient for building accurate classifiers. The thesis present a novel algorithm to efficiently mine SJEPs of both, data classes, because existing algorithms can’t directly mine such SJEPs. And how to build a new classifier (SJEP_ Classifier) based on SJEP is introduced.Compared with previous JEP-based classifiers, the classifier based exclusively on SJEPs, which uses much fewer JEPs, not only can achieve almost the same or higher predictive accuracy, but also can finish learning phase in very short time (usually in a few seconds). And our classifier outperforms both CBA and C4.5 generally in terms of average accuracy, which has been shown by our experimental results.In conclusion, this paper analyzes application architecture of data mining systems, creates new mining theoretic models, and designs a a new classifier (SJEP_ Classifier) based on SJEP.

Related Dissertations
More Dissertations