|
21. |
The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 738-754
Leo Breiman,
Preview
|
PDF (1302KB)
|
|
摘要:
When a regression problem contains many predictor variables, it is rarely wise to try to fit the data by means of a least squares regression on all of the predictor variables. Usually, a regression equation based on a few variables will be more accurate and certainly simpler. There are various methods for picking “good” subsets of variables, and programs that do such procedures are part of every widely used statistical package. The most common methods are based on stepwise addition or deletion of variables and on “best subsets.” The latter refers to a search method that, given the number of variables to be in the equation (say, five), locates that regression equation based on five variables that has the lowest residual sum of squares among all five variable equations. All of these procedures generate a sequence of regression equations, the first based on one variable, the next based on two variables, and so on. Each member of this sequence is called a submodel and the number of variables in the equation is the dimensionality of the submodel. A complex problem is determining which submodel of the generated sequence to select. Statistical packages use variousad hocselection methods, includingFto enter,Fto delete,Cp, andt-value cutoffs. Our approach to this problem is through the criterion that a good selection procedure selects dimensionality so as to give low prediction error (PE), where the PE of a regression equation is its expected squared error over the points in theXdesign. Because the true PE is unknown, the use of this criteria must be based on PE estimates. We introduce a method called thelittle bootstrap, which gives almost unbiased estimates for submodel PEs and then uses these to do submodel selection. Comparison is made toCpand other methods by analytic examples and simulations. Little bootstrap does well—Cpand, by implication, all selection methods not based on data reuse give highly biased results and poor subset selection.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475276
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
22. |
A Resampling Procedure for Complex Survey Data |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 755-765
R.R. Sitter,
Preview
|
PDF (1035KB)
|
|
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475277
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
23. |
Testing the Rank and Definiteness of Estimated Matrices with Applications to Factor, State-Space and ARMA Models |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 766-776
Len Gill,
Arthur Lewbel,
Preview
|
PDF (1139KB)
|
|
摘要:
Consider any consistent, asymptotically normal estimateǎof an arbitrary rectangular or square matrixA. This article derives an explicit test for the rank ofAand a related test of (semi) definiteness ofA. Potential applications include testing for identification of structural models, testing for the number of state variables in state-space models (including tests for the order of autoregressive moving average (ARMA) processes), consumer demand analysis applications, and testing for the number of factors in factor analysis and related procedures. The test is based on the Gaussian elimination Lower-Diagonal-Upper triangular (LDU) decomposition. The test is illustrated with an empirical application to testing the order of ARMA processes.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475278
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
24. |
ARMA Covariance Structures with Time Heteroscedasticity for Repeated Measures Experiments |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 777-784
James Rochon,
Preview
|
PDF (832KB)
|
|
摘要:
Rochon and Helms (1989) presented a model for analyzing repeated measures experiments. The general linear model was used to assess the influence of covariate information, and ARMA time series models were put forward to characterize the covariance matrix among the repeated measures. Practical experience has suggested, however, that the ARMA assumption of constant variances and autocovariances over time is too restrictive for many applications. For example, observations may be relatively stable toward the beginning of the study but become more variable toward the end. This article presents a modification to this structure, which provides for heteroscedasticity over time. Maximum likelihood (ML) estimation procedures are considered, and the estimators are found to enjoy optimal large sample properties. A scoring algorithm is described for iterating to a solution of the ML equations. The model is illustrated with data from a clinical trial investigating human erythropoietin for treating anemia in end-stage renal disease.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475279
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
25. |
Residual Diagnostics for Mixture Models |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 785-794
BruceG. Lindsay,
Kathryn Roeder,
Preview
|
PDF (1021KB)
|
|
摘要:
A sample is commonly modeled by a mixture distribution if the observations follow a common distribution, but the parameter of interest differs between observations. For example, we observe the lengths but not the ages of a sample offish. It may be reasonable to assume that length is normally distributed about an unknown mean that depends on the age of the fish. Provided there is more than one age class in the sample, then the data are distributed as a mixture of normals. In this article we assume that the data are a random sample from a mixture of exponential family distributions and that for each observation the parameter of interest is sampled independently from an unknown mixing distributionQ. The adequacy of a fitted mixture model can be assessed by examining residuals based on the ratio of the observed to expected fit. Residuals based on the homogeneity model (in whichQis a one-point distribution) display a convexity property when the data follow a mixture model; this becomes the basis for diagnostic plots to detect the presence of mixing. Similar results also are obtained from smoothed residuals; thus the diagnostic also can be applied to sparse or continuous data. The nonparametric maximum likelihood estimate[Qcirc]of the distributionQis known to be discrete. Smoothed residuals obtained from the fitted mixed model provide information about the number of support points in[Qcirc]. This facilitates the use of the EM algorithm to find[Qcirc]. The residuals evaluated at[Qcirc]determine whether or not the maximum likelihood estimate is unique and hence interpretable. Simulated and actual data sets are analyzed to illustrate the power and the utility of these procedures.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475280
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
26. |
Diagnostics for Overdispersion |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 795-804
LisaM. Ganio,
DanielW. Schafer,
Preview
|
PDF (958KB)
|
|
摘要:
Diagnostic tools are proposed for assessing the dependence of extrabinomial or extra-Poisson variation on explanatory variables and for comparing several common models for overdispersion. These tools are based on tests for regression terms in the dispersion parameter of a generalized linear model, using double exponential family and “pseudolikelihood” formulations. Score tests do not require the full fitting of models for variation and lead to easy graphical and numerical procedures based on squared residuals (deviance or Pearson). Robust modifications of these are motivated by Levene's test in linear models. The diagnostic tools are intended primarily to ensure prudent modeling of the variance to make correct inferences about parameters in the mean.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475281
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
27. |
Existence and Uniqueness of the Maximum Likelihood Estimator for a Multivariate Probit Model |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 805-811
Emmanuel Lesaffre,
Heinz Kaufmann,
Preview
|
PDF (722KB)
|
|
摘要:
The multivariate probit model (MPM) is a particular case of the class of correlated prediction models. A correlated prediction model is especially useful when prediction or classification is envisaged into diagnostic classes that are combinations of binary responses. The parameter vector consists of a “location” part and an “association” part. The location part accounts for the effect the regressors have on the marginal probabilities of the binary responses. The association part corrects these probabilities, taking into account that the responses are related. This article investigates conditions for the existence and unicity of the maximum likelihood estimator (MLE) of the parameter vector. It turns out that the existence and uniqueness of the MLE for the location parameters when the association parameters are known are related to those of the multigroup logistic model. Necessary and sufficient conditions are given for the existence of the MLE of the association part. On the other hand the conditions for the unicity of the MLEs of the association parameters are much more complicated and not yet established. Finally, the article shows that for an MPM the estimates of the regression parameters for the location part exist and are unique if and only if they exist and are unique for each marginal univariate probit model. This result provides practical guidelines to detect early divergence Good starting values are essential; this problem is touched on briefly. The theoretical results are illustrated by a medical example.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475282
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
28. |
A Generalizable Formulation of Conditional Logit with Diagnostics |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 812-816
EhsanS. Soofi,
Preview
|
PDF (501KB)
|
|
摘要:
The conditional logit model is a multinomial logit model that permits the inclusion of choice-specific attributes. This article shows that the conditional logit model will maximize entropy given a set of attribute-value preserving constraints. A correspondence between the maximum entropy (ME) and maximum likelihood (ML) estimates for logit probabilities is established. Some easily computable and useful diagnostics for logit analysis are provided, and it is shown that an evaluation of the relative importance of attributes can be made using the ME formulation. The ME formulation is also generalized to accommodate initial choice probabilities into the logit model. An example is given.KEY WORDS: Choice models; Entropy; Kullback-Leibler discrimination information function; Relative importance.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475283
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
29. |
The Analysis of Repeated Categorical Measurements Subject to Nonignorable Nonresponse |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 817-824
MarkR. Conaway,
Preview
|
PDF (801KB)
|
|
摘要:
Conditional likelihood methods have been proposed for analyzing the repeated measurement of categorical responses. This article extends the methods to include partially classified subjects. The extension allows for an ignorable or nonignorable nonresponse mechanism and uses standard statistical software for the computations. Two examples of incomplete longitudinal categorical data illustrate the method.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475284
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
30. |
Testing Hypotheses about an Identified Treatment when there are Multiple Endpoints |
|
Journal of the American Statistical Association,
Volume 87,
Issue 419,
1992,
Page 825-831
EugeneM. Laska,
Dei-In Tang,
MorrisJ. Meisner,
Preview
|
PDF (768KB)
|
|
摘要:
The problem of comparing an identified treatment withKother treatments is considered in a multivariate setting. Many formulations of composite alternative hypotheses are possible. For example, one might wish to examine whether the identified treatment is superior to the other treatments on all components of the response vector (i.e., is uniformly best) or whether the identified treatment is better than each treatment on at least one component (i.e., is admissible). For testing whether the identified treatment is uniformly best, the known optimality of the min test in the univariate case is extended to the multivariate case. If the distribution is multivariate normal, then the min test is shown to be a likelihood ratio test. For testing whether the identified treatment is admissible, a min test based on the Bonferroni inequality is suggested. For the multivariate normal with unknown covariance matrix, the likelihood ratio test is also a min test, but it has less stable power characteristics than does the Bonferroni-based test.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475285
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
|