The Application of Punishment COX Model and Elastic Net Technology in Survival Analysis of High-dimensional Data
|Course||Epidemiology and Biostatistics,|
|Keywords||high-dimensional biological data survival analysis L2 - COX model L1 COX model EN - COX model|
Objective Predicting patients on cancer according the high-dimensional genes or protein data using the DNA microarray technology and protein spectrum technology has not used Cox proportion risk model. This paper will explore the advantages and disadvantages of Punishment COX model and Elastic net technology through the simulation research and analyzing Van ’t Veer (2002) breast cancer research data ,to reveal the relationship between the time of death or other ends occurrence and the biological data, to get more accurate diagnosis and prognosis or to improve the therapeutic.Methods Introduce the basic principle of Punishment COX model (including L2 punishment COX model and L1 punishment COX model) and Elastic net technology. Simulating the characteristics of high dimension, the strong correlation,and small samples, Van ’t Veer (2002) scholar breast cancer research data set is analyzed, and reviewing the model prediction performance. The simulation and analysis Using R software.Results Model prediction performance evaluation standard is R 2, from simulations we conclude that as their data variance enlarged ,the more variables were selected, and the R 2 is larger , model fitting is the better. Along with the increase of censored ratio, prediction efficiency of several methods will be reduced, we can know the prediction ability will be influenced by censored ratio.Conclusion L2 - COX model and L1 - COX model are high-dimensional data processing methods, L2 - COX model has not dimension reduction effect, but it has strong effect of processing linear. L1 - COX model is used for dimension reduction of high-dimensional data but effect of processing linear is weaker. EN - COX model drew advantages of the two models and can not only effectively disposal linear but also reduce the dimensiona and improve L1, it is the ideal model of small sample and high-dimensional survival data.