Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Data structures

The Research of Date Mining Technology Based on WEB

Author MaLiNa
Tutor LiuHong
School Shandong Normal University
Course Management Science and Engineering
Keywords software agent multi-agent system information retrieval date mining web
CLC TP311.12
Type Master's thesis
Year 2002
Downloads 346
Quotes 2
Download Dissertation

The network technology has penetrated in all sides of the society. Accompany ing the rapid development of network all over the world,more and more information mediums such as database and information system are enterring in it,which has made Internet become the most abundant and sweeping database of the world. WWW has give us a bran-new network world,and at the same time put us in a enormous network labyrinth also.In the face of the profusive and complex space of the Web,the main problem puzzling the world is how to mine knowledge efficiently and quickly from the tremendous amount of net html document.Date mining technology based on web is a good method to deal with the problem.Recently,date mining technology based on web has abstract the interest of many researchers.Possession of lower data is the precondition of mining higher knowledge,and net information retrieval technique has become the urgent issue of date mining technology based on web,so the paper explore the research of it and focus on the information retrieval based on Web.With the expectation of a intelligent information retrieval system,which can catch the interests of customers with good precision and capability,we give deep research to some of the information retrieval technique.The information retrieval technique can satisfy part of information requirement of the users,but it can also put them in an ocean of information which is offerred by the systems more often than not.Recently,there are many hotspots of information retrieval technique.Based on the studies of model and structure of retrieval system,we give it deep exploration and draw the conclusion via experiments.There are two kinds of retrieval model,Full Text Retrieval and Content Retrieval,and the vector space model(VCM) of the latter is a widely-used method with better effect.The best excellence of VCM is the predominance of knowledge presentation,which expresses documents with vectors in vector space and changes the comparability issue to the distance of vectors,and thus reduces the complexity of documents matching.However.we can not affirm the absolute effect of it,and in this thesis we prove its nonproficiency through experiments.C.E.Shannon constructed information theory for the purpose of information transferring progress,by which to eliminate the uncertainty. Based on the work of other researchers,thepaper introduces information theory into TF IDF method of VCM,and forms a new TF IDF 1G method which embodies information gain all through.The paper has proved by experimentation that the new method not only can keep the documents difference of traditional TF IDF method,but also can catch the more minute discrimination in proportion of it than the old one. So we draw the conclusion that the new method can reduce the uncertainty and illegibility of TF IDF method in many aspects without more work.Knowledge purification is the key procedure of knowledge acquisition,and machine learning is a effective method to gain wisdom for computers,among which artificial neural network with tutor coached can learn more accurate knowledge by faint structure,and then is a perfect way to deal with misty knowledge by describing and computing intangibly.lt is hard to describe or compute the misty relation of terms and document sort with accurate way.and we can figure out misty knowledge with misty way,so the paper introduces ANN into VCM to form a conjoint method VCM ANN. YCM ANN works under the principle of VCM,in which ANN targets to adjust the faint knowledge of VCM and keeps the knowledge among the connections of ANN crunodes in FTART arm network.The thesis has testified that VCM ANN method is more precise than TF IDF IG,can conquer part of what the latter can not deal with,and has less work to compute. VCM ANN can take the relation between terms and document sorts more accurately than TF IDF IG,though what it catchs is misty information.In 1990s,the research of Agent is flourishing,and the agent-oriented computing is regarded as another breakthrough of software development and a

Related Dissertations
More Dissertations