|
31. |
Outlier Resistant Alternatives to the Ratio Estimator |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1174-1182
Jean-Philippe Gwet,
Louis-Paul Rivest,
Preview
|
PDF (831KB)
|
|
摘要:
Many techniques have been suggested to lower the impact of outliers on sample survey estimates. Outliers can be downweighted by winsorization; that is, by replacing extreme data points by a data-dependent or a predetermined value before calculating estimates. Another approach is to reduce the weight of outliers, from the inverse sampling fraction to 1, in the estimation of population characteristics. In this article the problem of estimating the population mean using auxiliary information in the presence of outliers is considered. A resistant version of the ratio estimator is introduced. It is constructed with aMor aGMestimator of the slope of the regression model through the origin, which is implicitly called on when considering the ratio estimator. The asymptotic biases and asymptotic variances of the proposed alternatives to the ratio estimator are calculated with respect to the randomization of the sampling plan. The selection of a resistant estimator is seen to involve a trade-off between bias and variance. Often, some bias is the price paid to reduce the variance. A mean squared error estimator is proposed. A model-based estimator proposed by Chambers, reducing the weights given to extreme observations to 1, is also studied. A conditional investigation of the bias, given the proportion of outliers in the sample, is carried out. It reveals that the unconditional unbiasedness of the ratio estimator is, in the presence of outliers, deceptive. Its conditional bias varies substantially with the difference between the sample proportion and the population proportion of outliers. It can be severe if the proportion of outliers in the sample is much larger than in the population. The conditional bias of resistant estimators is, on the other hand, more stable. It does not depend as much on the proportion of outliers in the sample. Monte Carlo comparisons of the ratio estimator with resistant alternatives are presented for two populations. These similations show that in the presence of outliers, the mean squared error of resistant estimators can be substantially smaller than that of the ratio estimator. They also show that resistant confidence intervals are interesting alternatives to intervals based on the ratio estimator.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476275
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
32. |
Characterizing Linear Birth and Death Processes |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1183-1187
LorrieL. Hoffman,
Preview
|
PDF (539KB)
|
|
摘要:
This research determined the manner of convergence of certain Markov processes to their steady state limiting distributions. This article looks at linear birth and death processes with birth rate at each state determined by the immigration constantaand the natural growth multiplierb;with death rate at each state determined by fixed execution constantcand the natural declination multiplierd. All parameters are nonnegative. There is a reflective barrier at state 0. It is shown that when the natural growth multiplier is less than the declination parameter a limiting distribution exists, that is, when the multiplier difference is negative. We define a modal indicator as the ratio of the sum of the death parameterscandddiminished by immigrationato the multiplier difference. It is shown that when the modal indicator is negative then the mode occurs at state 0. When the indicator is an integer then the process is bimodal with the mode at that integral value and at the next larger integer. When the indicator is not an integer then the mode occurs at the first integral value greater than the modal indicator. Additionally, bounds for the birth probabilities and the tail probabilities are derived. These equations are applied to an example in the area of computer performance analysis. The objective of studying the characteristics of the limiting distribution is to understand the difficulties involved when simulating these processes. The most extensively studied of these types of Markov processes is theM/M/1 process (Poisson arrivals to one server having exponential service times). This is a simple case of a linear birth and death process whereb=d= 0. Researchers have suggested speeding convergence by initializing the process in a state other than 0. This article reveals that zero is a good choice for theM/M/1 process, but it is not the best choice for the general linear birth and death process. Also, practitioners have devised empirical methods to decide when the number of iterationsNis sufficient to declare convergence. This article presents bounds on the tail probabilities in order to guide the selection ofNat the onset of simulation. The sense of the theorems proved below can be captured by these statements:1.The natural growth multiplierbmust be less than the declination parameterdto insure convergence.2.If the modal indicator is relatively large then intitializing the process in the modal state might be wise.3.Large values of the declination parameterdrelative to the growth multiplierbwill result in a better behaved process.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476276
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
33. |
Information Ratios for Validating Mixture Analyses |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1188-1192
MichaelP. Windham,
Adele Cutler,
Preview
|
PDF (476KB)
|
|
摘要:
Determining the number of components in a mixture of distributions is an important but difficult problem. This article introduces a procedure calledminimum information ratio estimation and validation(MIREV), which is based on a ratio of Fisher information matrices. The smallest eigenvalue of the information ratio matrix is used to determine the number of components. A measure of uncertainty may be obtained using a bootstrap technique. Simulations illustrate the effectiveness of the procedure. For mixtures of exponential families, an expression for the observed information ratio matrix provides insight to the success of the procedure. Cluster analysis attempts to identify and characterize subpopulations believed to be present in a population. A wide variety of methods, are available, including criterion optimization, hierarchical methods, and various heuristic methods. Criterion optimization techniques, such as mixture analysis, fuzzy clustering, and partitioning methods are popular because they allow a great deal of flexibility in defining when objects are similar. However, they typically assume models with a known number of subpopulations. When the number is unknown, the investigator usually obtains several solutions and must decide between them. The decision is difficult to justify without an objective procedure for comparing clustering results. Although numerous measures have been proposed to evaluate the quality of clustering results in general and the number of clusters in particular, these measures are difficult to interpret and often unreliable. The MIREV procedure works extremely well for some examples. Further research is required to establish the conditions under which the procedure can be expected to produce reliable results.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476277
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
34. |
The Use of Names for Linking Personal Records |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1193-1204
HowardB. Newcombe,
MarthaE. Fair,
Pierre Lalonde,
Preview
|
PDF (1287KB)
|
|
摘要:
The skill of a human who searches large files of personal records depends much on prior knowledge of how the names vary in successive documents pertaining to the same individuals (e.g., as with ANTHONY–TONY, JOSEPH–JOE, WILLIAM–BILL). Now, an essentially exact procedure enables computers to make similar use of an accumulated memory of their own past experiences when searching for, and linking, records that relate to particular persons. This knowledge is further applied to quantify the benefits from various refinements of the rules by which the discriminating powers of names are calculated when they do not precisely agree or are substantially dissimilar. Of the six refinements tested, by far the most important is the recently developed exact approach for calculating the ODDS associated with comparisons of names that are possible synonyms.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476278
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
35. |
Comment |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1204-1206
MaxG. Arellano,
Preview
|
PDF (352KB)
|
|
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476279
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
36. |
Rejoinder |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1207-1208
HowardB. Newcombe,
MarthaE. Fair,
Pierre Lalonde,
Preview
|
PDF (201KB)
|
|
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476280
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
37. |
Power Calculations for General Linear Multivariate Models Including Repeated Measures Applications |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1209-1226
KeithE. Muller,
LisaM. Lavange,
SharonLandesman Ramey,
CraigT. Ramey,
Preview
|
PDF (1986KB)
|
|
摘要:
Recently developed methods for power analysis expand the options available for study design. We demonstrate how easily the methods can be applied by (1) reviewing their formulation and (2) describing their application in the preparation of a particular grant proposal. The focus is a complex but ubiquitous setting: repeated measures in a longitudinal study. Describing the development of the research proposal allows demonstrating the steps needed to conduct an effective power analysis. Discussion of the example also highlights issues that typically must be considered in designing a study. First, we discuss the motivation for using detailed power calculations, focusing on multivariate methods in particular. Second, we survey available methods for the general linear multivariate model (GLMM) with Gaussian errors and recommend those based onFapproximations. The treatment includes coverage of the multivariate and univariate approaches to repeated measures, MANOVA, ANOVA, multivariate regression, and univariate regression. Third, we describe the design of the power analysis for the example, a longitudinal study of a child's intellectual performance as a function of mother's estimated verbal intelligence. Fourth, we present the results of the power calculations. Fifth, we evaluate the tradeoffs in using reduced designs and tests to simplify power calculations. Finally, we discuss the benefits and costs of power analysis in the practice of statistics. We make three recommendations:1.Align the design and hypothesis of the power analysis with the planned data analysis, as best as practical.2.Embed any power analysis in a defensible sensitivity analysis.3.Have the extent of the power analysis reflect the ethical, scientific, and monetary costs.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476281
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
38. |
Regression with MissingX's: A Review |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1227-1237
RoderickJ. A. Little,
Preview
|
PDF (1252KB)
|
|
摘要:
The literature of regression analysis with missing values of the independent variables is reviewed. Six classes of procedures are distinguished: complete case analysis, available case methods, least squares on imputed data, maximum likelihood, Bayesian methods, and multiple imputation. Methods are compared and illustrated when missing data are confined to one independent variable, and extensions to more general patterns are indicated. Attention is paid to the performance of methods when the missing data are not missing completely at random. Least squares methods that fill in missingX's using only data on theX's are contrasted with likelihood-based methods that use data on theX's andY. The latter approach is preferred and provides methods for elaboration of the basic normal linear regression model. It is suggested that more widely distributed software is needed that advances beyond complete-case analysis, available-case analysis, and naive imputation methods. Bayesian simulation methods and multiple imputation are reviewed; these provide fruitful avenues for future research.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476282
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
39. |
Book Reviews |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1238-1250
Preview
|
PDF (1840KB)
|
|
摘要:
Human Rights and Statistics: Getting the Record Straight.Thomas B. Jabine and Richard P. Claude (eds). Philadelphia: University of Pennsylvania Press, 1992. xvii + 458 pp. $36.95.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476283
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
40. |
Publications Received |
|
Journal of the American Statistical Association,
Volume 87,
Issue 420,
1992,
Page 1250-1251
Preview
|
PDF (1984KB)
|
|
ISSN:0162-1459
DOI:10.1080/01621459.1992.10476284
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
|