Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > General issues > Theories, methods > Algorithm Theory

Protein Structure Classification Algorithms Based on Sequence Similarity

Author LiYuGang
Tutor LiuZhiYong
School Institute of Computing Technology
Course Computer System Architecture
Keywords Sequence alignment optimalize performance of algorithms Support Vector Machine
CLC TP301.6
Type PhD thesis
Year 2004
Downloads 478
Quotes 3
Download Dissertation

An important research topic in bioinformatics is to understand the meaning and function of each protein encoded in the genome. One of the most successful approaches to this problem is via protein classification. It has for long played a central role on how to improve the computing efficiency and reducing the memory requirement on the condition that the results will not be reduced too much. Focusing on this problem, we choose the algorithm and the parallel computer architecture as the central topic of our research.The main contribution of this thesis includes the follows.1) Based on the support vector machine algorithm, piece sequence evolution distance kernel has been proposed. Because each sequence is compared with the ’center’ sequence of the family, instead of with every sequence of it, a significant speedup can be achieved. Meanwhile, each part of the two sequences is compared accordingly, insteaded of comparing the two whole sequences, the sensitivity can be guaranteed. The results show that this method is a little more precise than the SVM-pairwise method, which is one of the most accurate methods. More over, on the respect of computational efficiency, it is significantly better than the later, and is about 10 times faster than the later in the experiments of classifying 54 protein families in average.2) Focusing on the parallelism and locality of the architecture of CoSMPs, the main factors that influence the performance are analyzed, and the problems of how to parallelize and optimize applications are investigated. The merits and demerits of the two programming models: the MPI mode and the MPI + SMP directive mode are investigated. Then, methods of how to improvement performance and parallize algorithms on the cluster of SMPs are proposed.3) High performance parallel Smith-Waterman algorithm for protein classification. This method can reduce the space complexity from 0(mn) to 0(m) while nearly double the running time.4) Using the strategy of divide and conquer, a scalable parallel algorithm of

Related Dissertations
More Dissertations