Information Ratios for Validating Mixture Analyses
作者:
MichaelP. Windham,
Adele Cutler,
期刊:
Journal of the American Statistical Association
(Taylor Available online 1992)
卷期:
Volume 87,
issue 420
页码: 1188-1192
ISSN:0162-1459
年代: 1992
DOI:10.1080/01621459.1992.10476277
出版商: Taylor & Francis Group
关键词: Bootstrapping;Cluster analysis;Cluster validity;EM algorithm;Fisher information
数据来源: Taylor
摘要:
Determining the number of components in a mixture of distributions is an important but difficult problem. This article introduces a procedure calledminimum information ratio estimation and validation(MIREV), which is based on a ratio of Fisher information matrices. The smallest eigenvalue of the information ratio matrix is used to determine the number of components. A measure of uncertainty may be obtained using a bootstrap technique. Simulations illustrate the effectiveness of the procedure. For mixtures of exponential families, an expression for the observed information ratio matrix provides insight to the success of the procedure. Cluster analysis attempts to identify and characterize subpopulations believed to be present in a population. A wide variety of methods, are available, including criterion optimization, hierarchical methods, and various heuristic methods. Criterion optimization techniques, such as mixture analysis, fuzzy clustering, and partitioning methods are popular because they allow a great deal of flexibility in defining when objects are similar. However, they typically assume models with a known number of subpopulations. When the number is unknown, the investigator usually obtains several solutions and must decide between them. The decision is difficult to justify without an objective procedure for comparing clustering results. Although numerous measures have been proposed to evaluate the quality of clustering results in general and the number of clusters in particular, these measures are difficult to interpret and often unreliable. The MIREV procedure works extremely well for some examples. Further research is required to establish the conditions under which the procedure can be expected to produce reliable results.
点击下载:
PDF (476KB)
返 回