Analysis and Implement of Data Management Based on ETL

Author WangBing
Tutor JiangNingKang
School East China Normal University
Course Software Engineering
Keywords Data cleansing Data Replication ODI
CLC TP311.13
Type Master's thesis
Year 2008
With the rapid development of the computer network and database technology as well as the diversity of people access to data means , various data resources increasingly rich dramatic increase in the amount of data , and the University, as an important member of the community of nations , the degree of information technology and network ensue a tremendous change in many sectors in varying degrees, rely on computer software to assist in the completion of work , improve business processes through the use of these software capabilities and efficiency of the office . However , an increasing number of different types of information and data to the database management brings a lot of problems , mainly in the two major aspects of the data cleaning and data replication to correct data errors such as how to avoid wrong decisions , to reduce the risk of decision-making ? How to between the various departments both flexible information exchange and sharing , but also unified management and use ? currently the main method is synchronous replication of data cleaning and data on these data . Metadata cleaning so we get is credible , safe , consistent , and then after cleaning the data through data replication tools poured into public databases , so that the various departments of the school to be able to share data resources . This paper introduces the principle of ETL (Extract, Transfer, Load) - based data cleaning and data replication , and apply them in practical work , the main work is as follows : ( 1) Introduction cleaning technology at home and abroad at this stage data replication and data its application ; ( 2 ) pointed out between the various departments of the University of the data source , the problems of data quality and data consistency ; ( 3 ) analysis of data quality problems exist cleaning and replication strategies and design ; ( 4 ) describe how use of data cleaning and replication the tools Oracle Data Integrator ( referred ODI) extracted the data from various data sources , in accordance with predetermined rules to clean , and then transfer to copy loaded into the target database (ie, public database ) , in order to achieve data the purpose of sharing resources . ( 5 ) papers in the prevention of suspicious data cleaning strategies and how to balance the efficiency and performance of data replication needs to be further discussion .

