Applications of multiple imputation in medical studies. Multiple imputation mi rubin, 1987 is a widely used method for handling missing data. The idea of multiple imputation for missing data was first proposed by rubin 1977. The missingdata mechanism has three classifications rubin, 1976. Inference from multiple imputation for missing data using. With multiple imputation, m 1 plausible sets of replacements are generated for the missing values, thereby generating. We describe how mi methods for fullcohort studies can be adapted to account for the sampling designs of nested casecontrol and casecohort studies. Chapter 6 more topics on multiple imputation and regression modelling. Imputation similar to single imputation, missing values are imputed. This is also true of the multiple imputation methods available in the recently released missing data module for spss or.
The key step in rubin s 1978 multiple imputation is. Statistical analysis multiply imputed data was used. In order to deal with the problem of increased noise due to imputation, rubin 1987 developed a method for averaging the outcomes across multiple imputed data sets to account for this. As early as the 1970s, rubin 1978 proposed the theory of multiple imputation. Multiple imputation for missing data statistics solutions. New computational algorithms and software described in a recent book schafer, 1997 allow us to create proper multiple imputations in complex multivariate settings. The problem multiple imputation was designed to address missing values are a problem in many data sets and seem especially common in the medical and social sciences. Regardless of the nature of the post imputation phase, mi inference treats missing data as an explicit source of random variability and. Parameter estimates after multiple imputation were derived based on rubin s combination rule rubin 1978, 1987, 1996. Rubin 1978 suggested to take several independent realizations of imputation mechanism, and provided the ways to combine the estimates to obtain the point estimates and standard errors valid under proper imputation assumptions. Quantifying the impact of fixed effects modeling of.
Statistical analysis with missing data wiley series in. Multiple imputation was designed to handle the problem of missing data in public use data. Let us first introduce the basics of rubins multiple imputation rubin, 1987. The second major algorithm is called fully conditional. A commercial software program using the mcmc algorithm is sas proc mi sas, 2011. A cautionary tale allison summarizes the basic rationale for multiple imputation. Multiple imputation using chained equations for missing. His original goal was to impute mcompleted datasets for public usage in the context of public surveys in which a response rate of less than 60 percent for any variable was. This is the original version of rubin s 1978, 1987 multiple imputation.
It should be noted that this volume is not intended to be the exclusive source of the multiple imputation software. The technique of multiple imputation, which originated in early 1970 in application to survey nonresponse rubin1976, has. Multiple imputation by ordered monotone blocks with. We illustrate rr with a ttest example in 3 generated multiple imputed datasets in spss. Multiple imputation for multivariate missingdata problems. Descriptive statistics em algorithm for data with missing values statistical assumptions for multiple imputation missing data patterns imputation methods monotone methods for data sets. Missing data analysis with the mahalanobis distance. The pool function averages the estimates of the complete data model, computes the total variance over the repeated analyses by rubin s rules rubin, 1987, p. The full text of this article is available in pdf format.
We consider three imputation approaches suitable for use. A multivariate technique for multiply imputing missing. Introduction the general statistical theory and framework for managing missing information has been well developed since rubin 1987 published his pioneering treatment of multiple imputation methods for nonresponse in surveys. Multiple imputation is a statistical technique for handling incomplete data and for delivering an analysis that makes use of all possible information rubin, 1977 1978.
It can be used for multiple imputation of missing data of several variables with no particular structure. In recent years, the parametric multiple imputation method proposed by rubin 1978, 1987 has become one of the most popular methods for handling missing data. Using multiple imputation to address missing values of. Therefore this handout will focus on multiple imputation. Multiple imputation for missing data is an attractive method for handling missing data in multivariate analysis. Multiple imputation by ordered monotone blocks with application to the anthrax vaccine research program fan li, michela baccini, fabrizia mealli, elizabeth r zell, constantine e frangakis, donald b rubin 1 abstract. Under the multiple imputation paradigm of rubin 1978, the imputer generates copies of the missing data from pz. Creation of the multiply imputed values is a key step in a multiple imputation analysis. I know that i can use rubin s rules implemented through any multiple imputation package in r to pool means and standard. Using widely available commercial software the only method for. Multiple imputation allows the uncertainty due to imputation to be reflected in the analysis rubin, 1978, 1987.
Inference from multiple imputation for missing data using mixtures. Schafer 1997, van buuren and oudshoom 2000 and raghunathan et al. Rubin, educational testing service a general attack on the problem of non response in sample surveys is outlined from the. Most popular statistical software packages have options for multiple imputation, which require little. Multiple imputation provides a useful strategy for dealing with data sets with missing values. This method does not require any direct assumption on joint distribution of the variables and it is presently implemented in standard statistical software splus, stata. The software described in this manual is furnished under a license agreement or nondisclosure agreement. Such methods include multiple imputation rubin, 1978 and the expectation maximisation em algorithm dempster et al.
Rubin multiple imputation was designed to handle the problem of missing data in publicuse data bases where the database constructor and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access only to completedata software and possess limited knowledge of specific reasons and models for nonresponse. The focus of this thesis will be on multiple imputation but both methods, among others, will be outlined. His original goal was to impute mcompleted datasets for public usage in the context of public surveys in which a response rate of less than 60 percent for any variable was rare. For this situation and objective, i believe that multiple imputation by the database constructor is the method of choice. Abstract multiple imputation was designed to handle the problem of missing data in publicuse data bases where the database constructor and the ultimate user are distinct entities. Rubin db 1978 multiple imputation in sample surveys a phenomenological bayesian approach to nonresponse. It was derived using the bayesian paradigm rubin 1987 1996. Reporting the use of multiple imputation for missing data in higher education research. Rpackage norm currently implements this version of multiple imputation schafer, 1997. Multiple imputation using chained equations for missing data in timss. The multiple imputation framework suggested by rubin 1978, 1987a, 1996 is an attractive option if a data set is to be used by multiple researchers with differing levels of statistical expertise. This provides for an interesting alternative when there is a concern that single imputation.
In this chapter, we will deal with some specific topics when you perform regression modeling in multiple imputed datasets. Multiple imputation mi is a way to deal with nonresponse bias missing research data that. Multiple imputation rubin 1978, 1987 has come a long way. According to rubin 1978, the multiple imputation estimator denoted. Multiple imputation using chained equations for missing data in. Using amelia in r, i obtained multiple imputed datasets.
Reporting the use of multiple imputation for missing data. I examine two approaches to multiple imputation that have been incorporated into widely available software. Therefore, multiple imputation by the emb algorithm can be considered to be proper imputation in rubin s sense 1987. His original goal was to impute m completed datasets for public usage in the context of public surveys in which a response rate of less than 60 percent for any variable was.
The general, very simplified, procedure as outlined by rubin, 1987 is a series of steps. A free r software package called mimix that implements our methods is. Forty years after donald rubin s seminal paper rubin, 1978 which introduced the concept of multiple imputation, the approach has been shown to be useful in many contexts going far beyond the classical item nonresponse in cross sectional surveys for which it was origi. The development of diagnostic techniques for multiple imputation, though, has been retarded. In the late 1970s, rubin 1978 proposed the theory of multiple imputation. All multiple imputation methods follow three steps. Rubin db 1978 multiple imputation in sample surveysa phenomological. Rubin s rule to combine multiply imputed results according to rubin 1987, for statistical results estimated using multiply imputed data, the nal point estimate is the average of the mcomplete. Under the multiple imputation paradigm of rubin 1978, the imputer. Multiple imputation mi rubin, 1978, 1987a2004, 1996. This chapter is a followup on the previous chapter 5 about data analysis with multiple imputation.
Rubin 1976 was the rst to introduce the concept of the mechanism of missing. In a 2000 sociological methods and research paper entitled multiple imputation for missing data. Multiple imputation for missingness due to nonlinkage and. The following is the procedure for conducting the multiple imputation for missing data that was created by rubin. Because the mi procedure does not adequately perform imputation for the data, this method is not described in detail. The first traditional algorithm is based on markov chain monte carlo mcmc. We assume that the imputer uses a parametric regression model, though we expect that these results would extend to imputation via a nonparametric method such as hot deck with the approximate bayesian bootstrap rubin and schenker, 1986. For nearly two decades i have been advocating and developing the use of multiple imputation to address aspects of this problem. An object of class mipo, which stands for multiple imputation pooled outcome. Results in simulation 1, the screeningfirst imputation approach is consistent with the datagenerating process and is expected to perform well. Using multiple imputation, we create two or more completed datasets, do the usual analysis on each completed dataset, then draw inferences based on both the within and between imputation variability. Multiple imputation of missing data in nested casecontrol.