NSTL回溯数据服务平台

首页

按字顺浏览

期刊浏览

卷期浏览

Variable Selection in Nonparametric Regression with Categorical Covariates

Variable Selection in Nonparametric Regression with Categorical Covariates

作者: Peter Bickel, Ping Zhang, Ping Zhang,

期刊: Journal of the American Statistical Association （Taylor Available online 1992）
卷期: Volume 87, issue 417

页码: 90-97

ISSN:0162-1459

年代: 1992

DOI:10.1080/01621459.1992.10475179

出版商: Taylor & Francis Group

关键词: Cross-validation;Model selection;Prediction

数据来源: Taylor

摘要:

This article extends the problem of variable selection to a nonparametric regression model with categorical covariates. Two selection criteria are considered: the cross-validation (CV) criterion and the accumulated prediction error (APE) criterion. We find that, asymptotically, the CV criterion performs well only when the true model is infinite-dimensional, while the APE criterion is appropriate when the true model is finite-dimensional. This is very similar to the case of linear regression model. A simulation study reveals some interesting small-sample properties of these criteria. To be more specific, suppose that we have observations (X1,Y1), …, (Xn, Yn) that are iid random vectors andX= (X(1),X(2), …), where theX(i)'s are categorical. We allowYto be of any type. Now a new observationXhas arrived and we want to predict the correspondingY. Such a framework is more appropriate than regressions with fixed covariates in situations where the covariates are observational rather than being controlled. For instance,Ycould be the time from HIV infection to developing clinical AIDS, and the covariates (mostly categorical or reducible to categorical) could be observations from blood tests, a physical examination, or further personal information, such as sexual practices obtained from an interview. Take another example:Ycould be the premium of an insurance policy with the covariates being the customer's general demographical information. Our goal is to select a subset of covariates that best predictY. We define the true model dimension asd0if the regression functionE(Y|X(1),X(2), …) is ad0-variate function. The main conclusions of the article are: (1) The popular CV criterion performs well only whend0= ∞. (2) There exist other criteria that are more appropriate than CV whend0< ∞. (3) There is no difference between conditional and unconditional prediction errors, as far as asymptotics are concerned. (4) The selection range has to depend on the sample size. In fact, we argue that, for a given sample sizen, we should only select models with the number of covariates not exceeding the order of magnitude ofo(logn). (5) Simulation study indicates that the CV criterion has nice small-sample properties.

点击下载: PDF (768KB)