Non-normal Confirmatory Factor Analysis in the Application of the Whole Gene Effect
|Course||Epidemiology and Biostatistics,|
|Keywords||SNPs nonnormality S-B scaled confirmatory factor analysis|
In the post-genomics era, because the SNPs are the most common of human sequence variationwhich has wide distribution in the human DNA and the detection of SNPs can be automated,single nucleotide polymorphism (SNPs) research has become a hot spot of biomedical research.Now, about SNPs adaptation of the statistical methods, has become international researchhot.Latent variable model or latent structural mode was introduced to haplotype, orhigh-dimensional correlation analysis of the overall effect of SNPs. But observation variablesand latent variables are normally distributed in latent variable models, no matter what kind ofquantitative genetic pattern，SNPs data would against its normality. Therefore, this paperproposed S-B measure (scaled) estimates to fit confirmatory factor models, to analysis overalleffect and correlation of SNPs which does not fit normal distribution.This paper introduces the related theories of confirmatory factor model detailedly, which includeoverview of the model, the model parameter estimation, model fitting evaluation and modelmodification. And it particularly introduces several methods for parameter estimation: maximumlikelihood estimation, Browne’s asymptotic distribution free method, S-B measure (scaled)estimate. And comparing these methods, it finds that the S-B measure (scaled) estimation is themost suitable method for SNPs data.Based on this theory, example of SNPs data is provided by GAW17. The study chooses 13 SNPslocated 6 gene in chromosome 2, the results show that: maximum likelihood estimation Chisquare degrees of freedom 2/ df=3.59,S-B scaled method 2/ df=2.89, maximum likelihoodestimation RMSEA=0.061,S-B method RMSEA=0.052. The results suggest that using the S-Bscaled method can get a more fitting index than ML. When analysis the SNPs data ,using S-Bestimated can get a better fitted model. In addition, because the correlation coefficients among 6genes are very large, we can treat the 6 gene as elementary factor, second-order confirmatoryfactor analysis, can be a fitting good, simple second-order model.The SNPs data is provided byGAW17, we firstly do latent variables score in the six genes , and genes and infections do t test,the result is the 6 genes have affected infection. We can speculate that the 13 SNPs sites in 6genes may be the infection pathogenic site. We can get the conclusion that A gene have theinfluence infection by doing the test between second-order factors and disease infection( t 3 .657, P 0.001). The discussion contains a briefly introduction of the main content of this study,and contains theparameter estimation maximum likelihood estimation and S-B scaled estimates, higher-orderconfirmatory factor model and S-B scaled estimates, higher-order confirmatory factor model andconfirmatory factor model respectively. Besides, the advantages and shortcoming and theprospects of research are also explained in the discussion.