首页   按字顺浏览 期刊浏览 卷期浏览 The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fi...
The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error

 

作者: Leo Breiman,  

 

期刊: Journal of the American Statistical Association  (Taylor Available online 1992)
卷期: Volume 87, issue 419  

页码: 738-754

 

ISSN:0162-1459

 

年代: 1992

 

DOI:10.1080/01621459.1992.10475276

 

出版商: Taylor & Francis Group

 

关键词: Best subsets;Mallows'sCp; Subset selection;Variable selection

 

数据来源: Taylor

 

摘要:

When a regression problem contains many predictor variables, it is rarely wise to try to fit the data by means of a least squares regression on all of the predictor variables. Usually, a regression equation based on a few variables will be more accurate and certainly simpler. There are various methods for picking “good” subsets of variables, and programs that do such procedures are part of every widely used statistical package. The most common methods are based on stepwise addition or deletion of variables and on “best subsets.” The latter refers to a search method that, given the number of variables to be in the equation (say, five), locates that regression equation based on five variables that has the lowest residual sum of squares among all five variable equations. All of these procedures generate a sequence of regression equations, the first based on one variable, the next based on two variables, and so on. Each member of this sequence is called a submodel and the number of variables in the equation is the dimensionality of the submodel. A complex problem is determining which submodel of the generated sequence to select. Statistical packages use variousad hocselection methods, includingFto enter,Fto delete,Cp, andt-value cutoffs. Our approach to this problem is through the criterion that a good selection procedure selects dimensionality so as to give low prediction error (PE), where the PE of a regression equation is its expected squared error over the points in theXdesign. Because the true PE is unknown, the use of this criteria must be based on PE estimates. We introduce a method called thelittle bootstrap, which gives almost unbiased estimates for submodel PEs and then uses these to do submodel selection. Comparison is made toCpand other methods by analytic examples and simulations. Little bootstrap does well—Cpand, by implication, all selection methods not based on data reuse give highly biased results and poor subset selection.

 

点击下载:  PDF (1302KB)



返 回