|
11. |
Capturing the Intangible Concept of Information |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1243-1254
EhsanS. Soofi,
Preview
|
PDF (1263KB)
|
|
摘要:
The purpose of this article is to discuss the intricacies of quantifying information in some statistical problems. The aim is to develop a general appreciation for the meanings of information functions rather than their mathematical use. This theme integrates fundamental aspects of the contributions of Kullback, Lindley, and Jaynes and bridges chaos to probability modeling. A synopsis of information-theoretic statistics is presented in the form of a pyramid with Shannon at the vertex and a triangular base that signifies three distinct variants of quantifying information: discrimination information (Kullback), mutual information (Lindley), and maximum entropy information (Jaynes). Examples of capturing information by the maximum entropy (ME) method are discussed. It is shown that the ME approach produces a general class of logit models capable of capturing various forms of sample and nonsample information. Diagnostics for quantifying information captured by the ME logit models are given, and decomposition of information into orthogonal components is presented. Basic geometry is used to display information graphically in a simple example. An overview of quantifying information in chaotic systems is presented, and a discrimination information diagnostic for studying chaotic data is introduced. Finally, some brief comments about future research are given.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476865
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
12. |
Flexible Discriminant Analysis by Optimal Scoring |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1255-1270
Trevor Hastie,
Robert Tibshirani,
Andreas Buja,
Preview
|
PDF (1520KB)
|
|
摘要:
Fisher's linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can find a reduced number of discriminant coordinate functions that are “optimal” for separating the groups. With two such functions, one can produce a classification map that partitions the reduced space into regions that are identified with group membership, and the decision boundaries are linear. This article is about richer nonlinear classification schemes. Linear discriminant analysis is equivalent to multiresponse linear regression using optimal scorings to represent the groups. In this paper, we obtain nonparametric versions of discriminant analysis by replacing linear regression by any nonparametric regression method. In this way, any multiresponse regression technique (such as MARS or neural networks) can be postprocessed to improve its classification performance.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476866
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
13. |
Generalized S-Estimators |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1271-1281
Christophe Croux,
PeterJ. Rousseeuw,
Ola Hössjer,
Preview
|
PDF (1045KB)
|
|
摘要:
In this article we introduce a new type of positive-breakdown regression method, called a generalized S-estimator (or GS-estimator), based on the minimization of a generalized M-estimator of residual scale. We compare the class of GS-estimators with the usual S-estimators, including least median of squares. It turns out that GS-estimators attain a much higher efficiency than S-estimators, at the cost of a slightly increased worst-case bias. We investigate the breakdown point, the maxbias curve, and the influence function of GS-estimators. We also give an algorithm for computing GS-estimators and apply it to real and simulated data.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476867
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
14. |
Bootstrap Methods for Finite Populations |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1282-1289
JamesG. Booth,
RonaldW. Butler,
Peter Hall,
Preview
|
PDF (757KB)
|
|
摘要:
We show that the familiar bootstrap plug-in rule of Efron has a natural analog in finite population settings. In our method a characteristic of the population is estimated by the average value of the characteristic over a class of empirical populations constructed from the sample. Our method extends that of Gross to situations in which the stratum sizes are not integer multiples of their respective sample sizes. Moreover, we show that our method can be used to generate second-order correct confidence intervals for smooth functions of population means, a property that has not been established for other resampling methods suggested in the literature. A second resampling method is proposed that also leads to second-order correct confidence intervals and is less computationally intensive than our bootstrap. But a simulation study reveals that the second method can be quite unstable in some situations, whereas our bootstrap performs very well.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476868
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
15. |
A Comparison of Certain Bootstrap Confidence Intervals in the Cox Model |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1290-1302
Deborah Burr,
Preview
|
PDF (1202KB)
|
|
摘要:
We study bootstrap confidence intervals for three types of parameters in Cox's proportional hazards model: the regression parameter, the survival function at fixed time points, and the median survival time at fixed values of a covariate. Several types of bootstrap confidence intervals are studied, and the type of interval is determined by two factors. One factor is the method of drawing the bootstrap sample. We consider three such methods: (1) ordinary resampling from the empirical cumulative distribution function, (2) resampling conditional on the covariates, and (3) resampling conditional on the covariates and the censoring pattern. Another factor is the method of forming the confidence interval from a given sample; the methods considered are the percentile, hybrid, and bootstrap-t. All the methods of forming confidence intervals are compared to each other and to the standard asymptotic method via a Monte Carlo study. The data sets for this Monte Carlo study are simulated conditionally on the covariates and the censoring pattern, the situation appropriate for the third method of resampling. One conclusion drawn from the Monte Carlo study is that the asymptotic method is best for the regression parameter, but not for the survival function or the median survival time. Conclusions about the bootstrap methods include the surprising result that, overall, the second method of drawing the samples outperforms the third method. Also, there is an interaction effect between the two factors, method of drawing the sample and method of forming the interval, especially for estimation of the regression parameter. Finally, the bootstrap-tintervals are consistently outperformed by at least one of the two more rudimentary types of bootstrap interval.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476869
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
16. |
The Stationary Bootstrap |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1303-1313
DimitrisN. Politis,
JosephP. Romano,
Preview
|
PDF (1072KB)
|
|
摘要:
This article introduces a resampling procedure called the stationary bootstrap as a means of calculating standard errors of estimators and constructing confidence regions for parameters based on weakly dependent stationary observations. Previously, a technique based on resampling blocks of consecutive observations was introduced to construct confidence intervals for a parameter of them-dimensional joint distribution ofmconsecutive observations, wheremis fixed. This procedure has been generalized by constructing a “blocks of blocks” resampling scheme that yields asymptotically valid procedures even for a multivariate parameter of the whole (i.e., infinite-dimensional) joint distribution of the stationary sequence of observations. These methods share the construction of resampling blocks of observations to form a pseudo-time series, so that the statistic of interest may be recalculated based on the resampled data set. But in the context of applying this method to stationary data, it is natural to require the resampled pseudo-time series to be stationary (conditional on the original data) as well. Although the aforementioned procedures lack this property, the stationary procedure developed here is indeed stationary and possesses other desirable properties. The stationary procedure is based on resampling blocks of random length, where the length of each block has a geometric distribution. In this article, fundamental consistency and weak convergence properties of the stationary resampling scheme are developed.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476870
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
17. |
Simulation-Extrapolation Estimation in Parametric Measurement Error Models |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1314-1328
J.R. Cook,
L.A. Stefanski,
Preview
|
PDF (1338KB)
|
|
摘要:
We describe a simulation-based method of inference for parametric measurement error models in which the measurement error variance is known or at least well estimated. The method entails adding additional measurement error in known increments to the data, computing estimates from the contaminated data, establishing a trend between these estimates and the variance of the added errors, and extrapolating this trend back to the case of no measurement error. We show that the method is equivalent or asymptotically equivalent to method-of-moments estimation in linear measurement error modeling. Simulation studies are presented showing that the method produces estimators that are nearly asymptotically unbiased and efficient in standard and nonstandard logistic regression models. An oversimplified but fairly accurate description of the method is that it is method-of-moments estimation using Monte Carlo-derived estimating equations.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476871
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
18. |
Fast Very Robust Methods for the Detection of Multiple Outliers |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1329-1339
A.C. Atkinson,
Preview
|
PDF (2839KB)
|
|
摘要:
A few repeats of a simple forward search from a random starting point are shown to provide sufficiently robust parameter estimates to reveal masked multiple outliers. The stability of the patterns obtained is exhibited by the stalactite plot. The robust estimators used are least median of squares for regression and the minimum volume ellipsoid for multivariate outliers. The forward search also has potential as an algorithm for calculation of these parameter estimates. For large problems, parallel computing provides appreciable reduction in computational time.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476872
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
19. |
Wavelet Methods for Curve Estimation |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1340-1353
A. Antoniadis,
G. Gregoire,
I.W. McKeague,
Preview
|
PDF (1154KB)
|
|
摘要:
The theory of wavelets is a developing branch of mathematics with a wide range of potential applications. Compactly supported wavelets are particularly interesting because of their natural ability to represent data with intrinsically local properties. They are useful for the detection of edges and singularities in image and sound analysis and for data compression. But most of the wavelet-based procedures currently available do not explicitly account for the presence of noise in the data. A discussion of how this can be done in the setting of some simple nonparametric curve estimation problems is given. Wavelet analogies of some familiar kernel and orthogonal series estimators are introduced, and their finite sample and asymptotic properties are studied. We discover that there is a fundamental instability in the asymptotic variance of wavelet estimators caused by the lack of translation invariance of the wavelet transform. This is related to the properties of certain lacunary sequences. The practical consequences of this instability are assessed.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476873
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
20. |
Semiparametric Regression in Likelihood-Based Models |
|
Journal of the American Statistical Association,
Volume 89,
Issue 428,
1994,
Page 1354-1365
Sally Hunsberger,
Preview
|
PDF (879KB)
|
|
摘要:
A weighted likelihood is used to estimate the parameters in a semiparametric model involving two covariates and allowing an association between the covariates. The development is for arbitrary but specified densities of the observations. The estimators are consistent and asymptotically normal. Hypothesis testing of the parametric component can be performed using a Wald test. Simulations and analysis of data with Bernoulli observations demonstrate the estimators' application. Speckman developed kernel estimators where the conditional density of the observations is normal withpparametric covariates. Speckman's estimators and the new estimators are asymptotically equivalent, with the bias of Speckman's estimators being smaller. As an example, we study the relationship between a binary response indicating the occurrence of an intraoperative cardiac complication (ICC) in vascular surgery patients and two risk factors: duration of the operation (OR) and ASA score, which is an evaluation of the patient's overall health prior to surgery. ASA score is modeled in the parametric portion, because it appears valid to assume that ASA is linearly related to the logit of the probability of an ICC. The functional relationship between OR duration and the logit of the probability of an ICC is unknown, so it is modeled nonparametrically.
ISSN:0162-1459
DOI:10.1080/01621459.1994.10476874
出版商:Taylor & Francis Group
年代:1994
数据来源: Taylor
|
|