Structural Characterizations of Peptide and Their Applications in Quantitative Sequence-Activity Relationships of Antimicrobial Peptide
|School||Hunan Agricultural University|
|Keywords||Antimicrobial peptide QSAM Geostatistics SVM Descriptor|
Peptide (antimicrobial peptide, AMP) is a class of peptide which is composed of 20-50 amino acid residues, and with anti-microbial activity.Now thousands of anti-microbial peptides in bacteria, fungi, insects and other isolated had been purified. AMP has characteristics with a molecular mass of small, good thermal stability, broad spectrum antimicrobial and even anti-virus and anti-tumor characteristics.AMP’s mechanism of action is different from the traditional antibiotic whcih is easy to produce resistance.Beause AMPs have extensive application prospects in agriculture (disease-resistant transgenic plants), and medicinal (to overcome the growing problem of antibiotic resistance, development of new antiviral and anticancer drugs) and other field. AMP is great concern at home and abroad.However, compared with traditional antibiotics, the antibacterial activity of most AMPs is not ideal and a large amount of high cost. The primary structure of peptide and protein determine their spatial structure and function, at the same time the primary structure is simple and a high-level structure is not easy to get.Thus, QSAM(Quantitative Sequence-Activity Model)replace QSAR (Quantitative Structure-Activity Relationship,)to modify pep-tides purposefully, design new peptide molecules is significant.For the design of AMPs length of 30 amino acid residues, in theory, there was a total of 2030 kinds of possible(excluding non-natural amino acids), obviously it was impossible to synthesis and biometric verification. The ultimate goal of AMP’s QSAM is to use a small amount of experimental data to establish QSAM model and predicted a small amount of high activity (only forecast) peptides, then synthetic and epigenetic test validation, so QSAM model’s accuracy of independent predictors is the key of success and fail-ure.Peptide’s QSAM involves three key areas:access descriptors, descriptor selec-tion, the choice of regression model.There was complex and non-linear relationship between descriptor and activity with the peptide, the traditional model such as multi-ple linear regression, partial least squares regression are imcapability to re-solve.SVM(support vector machines) based on structural risk minimization is statistical learning theory.SVM had a better solution to a local minimum, over-learning,nonlinear problems and had a excellent generalization ability. There-fore, we use SVM as the basic modeling tools.Irrelevant, redundant descriptors affect the model prediction accuracy, the descriptor selection is often coupling with regression model selection. The ef-fective that using stepwise linear regression in the QSAM model filter de-scriptors isn’t good. Our laboratory developed a sophisticated non-linear varia-ble selection method-multi-round optimization based on SVR, according to the principle of minmum mean square error (Mean Squared Error, MSE) to weed out the worst of a descriptor.But when the number of descriptors is very large, mul-ti-round optimization is extremely consuming time. Further, our laboratory de-veloped a high-dimensional rapid non-linear variable selection method based on SVR, and get a better solution to the problem. Therefore, this article focuses on getting descriptors that is the structure characterize of peptides.The traditional amino acid descriptor such as Z-scales, ISA-ECI, MS-WHIM scores cann’t character-ize the context association of peptide sequence (contextual association influence greatly on the activity of peptides), and the same time The traditional amino acid de-scriptor had poor stability in model. So we build two new peptide structure descriptors GS-AA531 and GS-AA531-MSCC consideration of the overall the peptide amino ac-id sequence.There was 531 kinds of physical and chemical properties for each amino acid residues in the amino acids index database.For the equal length peptide sys-tem(the length was n), each peptide can be series characterized with the AA531 and get n×531 a descriptor.GS-AA531 is a descriptor based on geostatistics (Geostatis-tics, GS) semivariogram which reflected relevance characteristics of sequence. For a peptide of length n, each properties can be characterized by (n-1) a semi-variance,and produce a (n-1)×531 features.MSCC (Multi-scale Component and Correlation, MSCC) reflect the composition and relevance of peptide sequences on Mul-ti-scale.GS-AA531-MSCC is a the synthesis of GS-AA531 and MSCC.For mast cell degranulation peptide analogue data set (25 peptides, each peptide 14 residues), we use AA531, GS-AA531 and GS-AA531-MSCC to characterize pep- tide structure, each peptide obtain 7434,6903,7372 descriptors respectively.Obtained 45,15,16 descriptors after high-dimensional rapid nonlinear characteristics screen-ing,20,12,11 descriptors was reservated further screened by multi-round optimiza-tion respectively.Decision coefficient R2 were 0.959,0.997,0.995, independent pre-dictors (external)Q2ext were 0.357,0.693,0.620 of SVR model. The result show GS-AA531 and GS-AA531-MSCC was significantly better than the AA531For CameL-s data set(101 peptides, each peptide 15 residues). we use GS-AA531 and GS-AA531-MSCC to characterize peptide structure, each peptide ob-tain 7434,7910 descriptors respectively.Obtained 22,18 descriptors after high-dimensional rapid nonlinear characteristics screening,17,13 descriptors was res-ervated further screened by multi-round optimization respectively. Decision coeffi-cient R2 were0.717,0.726, independent predictors (external) Q2ext were 0.716、0.708 of SVR model. The result show the SVR model characterizated by GS-AA531 and GS-AA531-MSCC was significantly superior to reference model reported in the liter-ature.QSAM study for two AMP data system show GS-AA531 and GS-AA531-MSCC are two kinds of new and effective methods of structural characterization of peptides and GS-AA531 is more robust. GS-AA531 combined with high-dimensional rapid non-linear feature screening and multi-round optimization will have a greater pro-spect in peptide QSAM.