A Graphical Technique for Determining the Number of Components in a Mixture of Normals
作者:
Kathryn Roeder,
期刊:
Journal of the American Statistical Association
(Taylor Available online 1994)
卷期:
Volume 89,
issue 426
页码: 487-495
ISSN:0162-1459
年代: 1994
DOI:10.1080/01621459.1994.10476772
出版商: Taylor & Francis Group
关键词: Cluster;Diagnostic plot;Finite mixture;Gaussian process;Mode
数据来源: Taylor
摘要:
When a population is assumed to be composed of a finite number of subpopulations, a natural model to choose is the finite mixture model. It will often be the case, however, that the number of component distributions is unknown and must be estimated. This problem can be difficult; for instance, the density of two mixed normals is not bimodal unless the means are separated by at least 2 standard deviations. Hence modality of the data per se can be an insensitive approach to component estimation. We demonstrate that a mixture of two normals divided by a normal density having the same mean and variance as the mixed density is always bimodal. This analytic result and other related results form the basis for a diagnostic and a test for the number of components in a mixture of normals. The density is estimated using a kernel density estimator. Under the null hypothesis, the proposed diagnostic can be approximated by a stationary Gaussian process. Under the alternative hypothesis, components in the mixture will express themselves as major modes in the diagnostic plot. A test for mixing is based on the amount of smoothing necessary to suppress these large deviations from a Gaussian process.
点击下载:
PDF (914KB)
返 回