Web Community Cores and Complete Communities Discovery Strategy
|School||Dalian University of Technology|
|Course||Applied Computer Technology|
|Keywords||Web Mining Web Community Community Core Dense Graph|
The most impressive feature of the Web is its self-organization:pages concerning a common interest are tightly connected by hyperlinks, which is regarded as dense sub graphs on the Web, named as communities. A Web community is defined as a set of Web pages created by people having the same interest.Web communities are valuable to investigation on Web. First, communities reflect the social action of Web users, evaluation history and inter-relation of Web. Next, communities are middle size units; provide the most reliable resources on certain topic. Besides, communities reflect the topic distribution of Web; provide an effective way to improve the efficiency of searching information. Finally, deep into the self-organization and Semantic of Web, it can provide better service to people in the future. Besides, automatically finding communities can help users to more effectively acquire information.Web communities are able to be oriented by complete bipartite graphs (also known as community cores). Each community almost surely contains at least one core. Focusing on the issues of extracting such community cores from the Web, in this paper we propose an effective C&C algorithm based on combination and consolidation to extract all embedded cores in Web graphs. Experiments on real and large data collections demonstrate that the proposed algorithm C&C is efficient and effective for the community core extraction. First, all the various emerging cores are able to be identified; Then, detecting all the embedded cores with different sizes only requires one-pass execution of C&C; Finally, the extraction process needs no user-determined parameters in C&C.With the available of all community cores, a two-step heuristic algorithm is proposed to specify Web communities. First, the sketches of communities are drawn by gradually merging overlapping community cores. Then, communities are completed by extending and including highly referred members. Experiments on real and large data collections demonstrate that the proposed algorithm is capable to effectively identify such communities that satisfy that: the relationships among the members of intra-communities are close; the boundaries between the inter-communities are sparse.