Research on Predicting Intrinsic Disorder Protein Structure Based on Supervision Manifold Learning Algorithm
|School||Harbin Engineering University|
|Course||Control Theory and Control Engineering|
|Keywords||intrinsic disorder protein support vector machines (SVM) locally linear embedding (LLE) Adaboost algorithm amino acid sequence|
With the smooth progression of HGP (human genome project), more and more protein sequence has been determined. However, it needs much time and great effort to determine protein and biological macromolecular structure by such a tedious method as experiment. Therefore, it is significant to study the structure and function of protein by theoretical computation and accordingly to guide the experiment. This thesis starting from the protein’s primary sequence using multiple classifier combinational algorithm to predicted intrinsic disorder protein structure. The central work as follows:1. Constructed two kinds of both order protein and intrinsic disorder protein sequence sets, according to different content of amino acid residues among different lengths intrinsic disorder protein sequence, The disorder data set is divided into long (> 30 amino acid residues,) and short (≤30 amino acid residues,) two subsets.2. Based on the single peptide, amino acid sequence double peptide structure attribute and hydrophobic physical attributes, to quantify amino acid sequence using sliding window, and constructing predictor models using RBF kernel function of support vector machine (SVM).Then determined the long and short sequences window length using 5 times cross validation, and so define kernel function of support vector machine parameters gamma value and punish coefficient coat value.3. In the feature extraction, because the data matrix get from sliding window cause dimension disaster easily, it needs dimension of the statute on matrix which projects the data from high-dimensional space to low-dimensional space. Mainly analyses the general dimension-reduction methods including principal component analysis (PCA) in linear dimension reduction methods, and a kind of nonlinear dimension-reduction method-nuclear principal component analysis (KPCA) based on PCA, Based on this, this thesis introduced learning algorithms local linear embedding method (LLE) to predict intrinsic disorder protein structure, and then the effectiveness of PCA, LLE,KPCA were validated through the experiment to get the best performance of LLE, therefore draw the conclusion of local linear relationship between amino acid residues.4. To improve the prediction accuracy of intrinsic disorder protein, a SVM predictor experimental fusion method for the recognition of intrinsic disorder protein structure based on Adaboost algorithm is proposed. As a result, understand the basic concept of predictor fusion, system framework, methods of design member predictor and fusion algorithm, for example Adboost algorithm. The experiment results show that the accuracy of multiple predictor fusion algorithms is greater than the ones using individual member predictor.