|
11. |
The Errors-in-Variables Problem: Considerations Provided by Radiation Dose-Response Analyses of the A-Bomb Survivor Data |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 351-359
DonaldA. Pierce,
DanielO. Stram,
Michael Vaeth,
DanielW. Schafer,
Preview
|
PDF (1193KB)
|
|
摘要:
Some basic issues in the errors-in-variables problem are discussed, in terms of considerations that arose in analyses of radiation effects on atomic bomb survivors. The setting essentially involves generalized linear models for the response variables, a very nonnormal distribution of the true covariable, and multiplicative errors in the observed covariable. Consideration is given to distinctions between structural and functional modeling. It is argued that careful attention to the apparent distribution of true covariables is critical in either case, and a quasi-structural approach to functional models is suggested. The focus is on the case in which the expected response is linear in the true covariable and strong assumptions are tentatively made about the model for covariate errors. For settings such as just described, which differ from that of much of the classical work in the area, it is emphasized that an attractive approach is based on weighted regression of the response on the expected values of the true covariable, given the observed values.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475214
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
12. |
Statistical Analysis of the Time Dependence of HIV Infectivity Based on Partner Study Data |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 360-372
StephenC. Shiboski,
NicholasP. Jewell,
Preview
|
PDF (1427KB)
|
|
摘要:
Statistical analyses of data from studies of human immunodeficiency virus (HIV) transmission in partners of infected individuals often focus on estimation of the per contact probability of virus transmission, or infectivity. Of particular interest is evaluating whether the infectivity changes during the course of a partnership and identifying factors that influence the infectiousness of the initially infected partner (called the index case) and the susceptibility of the uninfected partner. Estimation and inference are complicated by limitations in partner study data, which may include unknown time of infection for either or both partners and inaccurate or incomplete information on the number and frequency of contacts. Using techniques from survival analysis, we extend earlier work of Jewell and Shiboski by developing semiparametric models for partner study data that allow variation in the infectivity according to time since infection of the index case. These models provide a unifying framework for investigations of infectivity based on data from various types of partner studies. The necessary statistical methodology requires analysis of binary regression models with complementary log-log links, where components of the regression function are subject to smoothness or isotonicity constraints. The methods are illustrated on data sets from studies of heterosexual transmission.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475215
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
13. |
Votes or Competitions Which Determine a Winner by Estimating Expected Plurality |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 373-375
WarrenW. Esty,
Preview
|
PDF (344KB)
|
|
摘要:
A voting model is considered in which the philosophy of the traditional plurality method of selecting a winner is accepted but it is desired to compensate for the fact that voters may not have been able to evaluate all of the nominees, for example, voters voting for “best foreign film” from a list of nominees, consumers buying (that is, voting for) a product of a certain brand from a selection of available brands, and respondents selecting an answer to a multiple-choice question for which the possible alternative responses are not the same for all respondents. The model can be interpreted as a generalization to selections from possibly more than two nominees of the basic Bradley-Terry odds-ratio model of paired comparisons. Nominees (k) correspond to probabilities (pk), and the probability that a voter votes for an evaluated nomineekispkdivided by the sum of thepi's of the nominees evaluated by the voter. This article's major contribution is a convenient matrix representation of the log-likelihood function (2.3) and its gradient (2.5), from which maximum-likelihood estimates can be obtained easily using the method of steepest ascent. The resulting estimates can be used to determine a winner in several ways, to rank the nominees, and to estimate selection probabilities for voters considering any combination of nominees. An algorithm for identifying and eliminating degenerate special cases is given. A program that performs the calculations is available from the author. Interestingly, given the maximum likelihood estimator (MLE) vector, choosing the “winner” as the nominee with the highest estimatedpkis not equivalent to choosing the “winner” as the nominee that would have received the highest expected number of votes among only the voters who already voted,ifthey had all evaluated all the nominees.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475216
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
14. |
Calibration Estimators in Survey Sampling |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 376-382
Jean-Claude Deville,
Carl-Erik Särndal,
Preview
|
PDF (778KB)
|
|
摘要:
This article investigates estimation of finite population totals in the presence of univariate or multivariate auxiliary information. Estimation is equivalent to attaching weights to the survey data. We focus attention on the several weighting systems that can be associated with a given amount of auxiliary information and derive a weighting system with the aid of a distance measure and a set of calibration equations. We briefly mention an application to the case in which the information consists of known marginal counts in a two- or multi-way table, known asgeneralized raking. The general regression estimator (GREG) was conceived with multivariate auxiliary information in mind. Ordinarily, this estimator is justified by a regression relationship between the study variableyand the auxiliary vector x. But we note that the GREG can be derived by a different route by focusing instead on the weights. The ordinary sampling weights of thekth observation is 1/πk, whereπkis the inclusion probability ofk. We show that the weights implied by the GREG are as close as possible, according to a given distance measure, to the 1/πkwhile respecting side conditions calledcalibration equations. These state that the sample sum of the weighted auxiliary variable values must equal the known population total for that auxiliary variable. That is, the calibrated weights must give perfect estimates when applied to each auxiliary variable. That is a consistency check that appeals to many practitioners, because a strong correlation between the auxiliary variables and the study variable means that the weights that perform well for the auxiliary variable also should perform well for the study variable. The GREG uses the auxiliary information efficiently, so the estimates are precise; however, the individual weights are not always without reproach. For example, negative weights can occur, and in some applications this does not make sense. It is natural to seek the root of the dissatisfaction in the underlying distance measure. Consequently, we allow alternative distance measures that satisfy only a set of minimal requirements. Each distance measure leads, via the calibration equations, to a specific weighting system and thereby to a new estimator. These estimators form a family ofcalibration estimators. We show that the GREG is a first approximation to all other members of the family; all are asymptotically equivalent to the GREG, and the variance estimator already known for the GREG is recommended for use in any other member of the family. Numerical features of the weights and ease of computation become more than anything else the bases for choosing between the estimators. The reasoning is applied to calibration on known marginals of a two-way frequency table. Our family of distance measures leads in this case to a family ofgeneralized raking procedures, of which classical raking ratio is one.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475217
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
15. |
“Equivalent Sample Size” and “Equivalent Degrees of Freedom” Refinements for Inference Using Survey Weights under Superpopulation Models |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 383-396
RichardF. Potthoff,
MaxA. Woodbury,
KennethG. Manton,
Preview
|
PDF (1273KB)
|
|
摘要:
A number of procedures have been proposed to attack different inference problems for data drawn from a survey with a complex sample design (i.e., a design that entails unequal weighting). Most procedures either are based on finite-population assumptions or require the specification of an explicit model using a superpopulation rationale. Herein we propose some relatively simple approximate procedures that are based on a superpopulation model. They provide valid variance estimators, test statistics, and confidence intervals that allow for sample design effects as expressed by design weights and other weights. The procedures do not rely on conditioning on model elements such as covariates to adjust for design effects. Instead, we obtain estimators by rescaling sample weights to sum to the equivalent sample size (equal to sample size divided by design effect). Using weighted estimators for superpopulation models, we obtain approximations to confidence bounds on the mean for simple sampling situations as well as for cluster sampling, post-stratification, and stratified sampling. We also obtain approximate tests of hypotheses for one-way analysis of variance andk× 2 homogeneity testing. For all of these, further refinements based on the concept of equivalent degrees of freedom are provided. Additionally, a general method for determining and using poststratification weights is described and illustrated. The procedures in this article are better justified than the common expedient of making proportional adjustments so that the weights add to the sample size.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475218
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
16. |
The Analysis of Retrospectively Ascertained Data in the Presence of Reporting Delays |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 397-406
Mei-Cheng Wang,
Preview
|
PDF (917KB)
|
|
摘要:
Suppose the progress of a disease consists of two chronologically ordered events, termed the starting event and the failure event. In retrospective sampling, the sampling scheme under which observations in the data set are identified retrospectively, individuals who experienced the starting event but not the failure event are excluded and thus are truncated from the data set, and only those who experienced both the starting event and the failure event before a given time are observed. The problem of reporting delays arises when some of the failure events are not reported before the given time and thus the corresponding cases also are excluded from the data set. In survival studies failure time data sometimes are collected under the retrospective sampling scheme subject to reporting delays. This article explores nonparametric and semiparametric methods of dealing with this type of data. The results generalize some existing nonparametric and semiparametric methods for analyzing right-truncated data when reporting delays are absent. Estimation of the expected number of events is studied in detail, interpretation of the proposed estimates is discussed, and an analysis of the blood transfusion data from the Centers for Disease Control is presented.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475219
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
17. |
Tree-Structured Methods for Longitudinal Data |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 407-418
MarkRobert Segal,
Preview
|
PDF (1367KB)
|
|
摘要:
The thrust of tree techniques is the extraction of meaningful subgroups characterized by common covariate values and homogeneous outcome. For longitudinal data, this homogeneity can pertain to the mean and/or to covariance structure. The regression tree methodology is extended to repeated measures and longitudinal data by modifying the split function so as to accommodate multiple responses. Several split functions are developed based either on deviations around subgroup mean vectors or on two sample statistics measuring subgroup separation. For the methods to be computationally feasible, it is necessary to devise updating algorithms for the split function. This has been done for some commonly used covariance specifications: independence, compound symmetry, and first-order autoregressive models. Data analytic issues, such as handling missing values and time-varying covariates and determining appropriate tree size are discussed. An illustrative example concerning immune function loss in a cohort of human immunodeficiency virus (HIV)-seropositive gay men also is presented.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475220
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
18. |
Comparison of Model Misspecification Diagnostics Using Residuals from Least Mean of Squares and Least Median of Squares Fits |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 419-424
R.D. Cook,
D.M. Hawkins,
S. Weisberg,
Preview
|
PDF (621KB)
|
|
摘要:
This article explores model misspecification diagnostics based on least squares and least median of squares fits. It shows that in some circumstances, least median of squares methods (or any other estimator with the exact fit property) fail to reveal an incorrectly specified mean function, but least squares methods succeed.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475221
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
19. |
Bootstrap Critical Values for Testing Homogeneity of Covariance Matrices |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 425-429
Ji Zhang,
DennisD. Boos,
Preview
|
PDF (505KB)
|
|
摘要:
Bartlett's modified likelihood ratio statistic A is often suggested in multivariate analysis for testing equality of covariance matrices. Unfortunately, theχ2-approximation to the null distribution of −2 log Λ is useful only when the data is normally distributed. This article presents a pooled bootstrap procedure that replaces theχ2-approximation and makes Bartlett's statistic a useful tool for data analysis. This procedure also applies to most quadratic form test statistics.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475222
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
20. |
Boundary Estimation |
|
Journal of the American Statistical Association,
Volume 87,
Issue 418,
1992,
Page 430-438
E. Carlstein,
C. Krishnamoorthy,
Preview
|
PDF (1045KB)
|
|
摘要:
A data set consists of independent observations taken at the nodes of a grid. An unknown boundary partitions the grid into two regions. All the observations coming from a particular region share a common distribution, but the distributions are different for the two different regions. These two distributions are entirely unknown and need not differ in their means, medians, or any other measure of “level.” The grid is of arbitrary dimension, and its mesh is rectangular. Our objective is to estimate the boundary without making any distributional assumptions. We propose a class of estimators and obtain strong consistency for them (including rates of convergence and a bound on the error probability). The boundary estimate is selected from an appropriate collection of candidate boundaries, which must be specified by the user. The candidate boundaries as well as the true boundary must satisfy certain intuitively natural regularity assumptions, including a “smoothness” condition. The boundary estimation problem has applications in diverse fields, including quality control, epidemiology, forestry, marine science, meteorology, and geology. Our method provides (as special cases) estimators for the change point problem, the epidemic change model, templates, linear bisection of the plane, and Lipschitz boundaries. Each of these examples is explicitly analyzed. A simulation study provides numerical evidence that the boundary estimators work well; in this simulation, the two distributions actually share the same mean, median, variance, and skewness. Finally, as an illustration, a boundary estimate is calculated on a data grid of cancer mortality rates in the United States.
ISSN:0162-1459
DOI:10.1080/01621459.1992.10475223
出版商:Taylor & Francis Group
年代:1992
数据来源: Taylor
|
|