The Statistical Inference for Count Data with Zero-inflation and Over-dispersion
|School||Kunming University of Science and Technology|
|Keywords||zero-inflation ZIP model ZINB model score test over-dispersion influence diagnostics missing data|
Count data is a kind of disperse data which are widely exist in daily life and studies. It is proved that Poisson regression modeling which was widely used in past practice and study, is a perfect approach to analysis such data.However, excess zeros relative to common Poisson distribution are often encountered in many application fields. Failure to account for the excess zeros may cause biased parameter estimates and misleading inferences. To account for the preponderance of zero counts, a class of zero-inflated Poisson mixed regression(ZIP) models is applicable for explaining excess zeros, which mixes the Poisson distribution with a degenerate component of point mass at zero. Therefore, the models choose are depend on whether there is zero-inflation exist in the count data. In this paper, a score test is proposed for the testing of zero-inflation. If there is zero-inflation exist, the ZIP model will be more appropriate; otherwise, a common Poisson regression model will be enough and more convenient.Besides, in many practical applications, zero-inflation and lack of independence may arise simultaneously as the hierarchical study design or the data collection procedure. Then a common single-level will no longer account for such data. To solve it, in this paper, a two-level ZIP regression model with random coefficient is proposed and a Bayesian approach is developed.What is more, the ZIP parameter estimates can be severely biased if the non-zero counts are over-dispersed in relation to the Poisson distribution. Then a zero-inflation negative binomial regression model (ZINB), which mixes the negative binomial distribution with a degenerate component of point mass at zero, may be more appropriate than the ZIP model. Therefore, the test for over-dispersion is indispensable for the models choose. In this paper, a score test is proposed for testing over-dispersion. If there is over-dispersion indeed exist in the count data, we can choose the ZINB model; otherwise, we can choose the ZIP model as a alternative model.Unfortunately, in many application fields, the missing data, which bring great trouble to our studies, are often encountered as well as zero-inflation and over-dispersion. To deal with such data and obtain accurate parameter estimate and rational inference, many methods were proposed. However, those methods are all base on the assumption of missing at random (MAR) and come from the same multi-level distribution. But in fact, sometimes the data is missing for the value is exceed its measurement. In this paper, we present a Data Augmentation (DA) method which does not rely on the MAR assumption and can model missing data mechanisms and covariate structure. This method utilizes the Gibbs Sampler as a tool for incorporating there structures and mechanisms. A simulation show that our method is more appropriate under the situation that the data is not missing at random.At last, we have a further discussion and expectation about our models.