Imputation based on Heckman model for multilevel data. — mice.impute.2l.2stage.heckman • miceheckman

Imputes outcome and predictor variables that follow an MNAR mechanism according to Heckman's model and come from a multilevel database such as individual participant data.

Usage

mice.impute.2l.2stage.heckman(
  y,
  ry,
  x,
  wy = NULL,
  type,
  pmm = FALSE,
  ypmm = NULL,
  meta_method = "reml",
  ...
)

Arguments

y: Vector to be imputed
ry: Logical vector of length length(y) indicating the the subset y[ry] of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.
x: Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.
wy: Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.
type: type of the variable in the prediction model 0: No predictor, 1: Predictor in both the outcome and selection,-2: Cluster id (study id), -3: Predictor only in the selection model, -4: Predictor only in the outcome model
pmm: predictive mean matching can be applied only for for missing continuous variables: "FALSE","TRUE"
ypmm: vector of donor values of y to perform the predictive mean matching, in case ypmm is not provided, the observable values of y are used.
meta_method: meta_analysis estimation method for random effects : "ml" (maximum likelihood), "reml" (restricted maximum likelihood) or "mm" method of moments.
...: Other named arguments. Not used.

Value

Vector with imputed data, of type binary or continuous

Details

Imputes systematically and sporadically missing binary and continuous univariate variables that follow a MNAR mechanism according to the Heckman selection model and come from a clustered dataset. The imputation method uses a two-stage approach in which the Heckman model parameters at the cluster level are estimated using the copula method.

Note

Missing binary variables should be included as two-level factor type variables. When the cluster variable is not defined in the predictor matrix as "-2", the imputation method is based on a simple Heckman model, i.e. without taking into account the hierarchical structure. In case the Heckman model cannot be estimated at the study level, the imputation method will be based on the simple Heckman model. Added:

Author

Julius Center Methods Group UMC, 2022

Examples

# example code
library(mice)
#> 
#> Attaching package: ‘mice’
#> The following object is masked from ‘package:stats’:
#> 
#>     filter
#> The following objects are masked from ‘package:base’:
#> 
#>     cbind, rbind
pred <- make.predictorMatrix(nhanes)
pred[, "age"] <- -3
mice(nhanes, pred = pred, meth = "2l.2stage.heckman")
#> 
#>  iter imp variable
#>   1   1  bmi
#> No group variable has been provided, the Heckman imputation model will be applied globally to the dataset.
#> The Heckman model cannot be estimated marginally, so systematically missing groups will be imputed with the Heckman model based on the full dataset.
#>   hyp
#> No group variable has been provided, the Heckman imputation model will be applied globally to the dataset.
#> The Heckman model cannot be estimated marginally, so systematically missing groups will be imputed with the Heckman model based on the full dataset.
#> Error in mice.impute.2l.2stage.heckman(y = c(2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1), ry = c(`1` = FALSE, `2` = TRUE, `3` = TRUE, `4` = FALSE, `5` = TRUE, `6` = FALSE, `7` = TRUE, `8` = TRUE, `9` = TRUE, `10` = FALSE, `11` = FALSE, `12` = FALSE, `13` = TRUE, `14` = TRUE, `15` = TRUE, `16` = FALSE, `17` = TRUE, `18` = TRUE, `19` = TRUE, `20` = TRUE, `21` = FALSE, `22` = TRUE, `23` = TRUE, `24` = TRUE, `25` = TRUE), x = structure(c(1, 2, 1, 3, 1, 3, 1, 1, 2, 2, 1, 2, 3, 2, 1, 1, 3, 2, 1, 3, 1, 1, 1, 3, 2, -31632.6652863162, 22.7, -52357.3794743866, -64121.0089968081, 20.4, -51517.1201986113, 22.5, 30.1, 22, -57679.0213659062, -31630.9844455429, -66641.78670761, 21.7, 28.7, 29.6, -52077.2930598841, 27.2, 26.3, 35.3, 25.5, -51517.120092814, 33.2, 27.5, 24.9, 27.4, 113, 187, 187, 229, 113, 184, 118, 187, 238, 206, 113, 238, 206, 204, 204, 186, 284, 199, 218, 187, 184, 229, 131, 284, 186), dim = c(25L, 3L), dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25"), c("age", "bmi", "chl"))),     wy = c(`1` = TRUE, `2` = FALSE, `3` = FALSE, `4` = TRUE,     `5` = FALSE, `6` = TRUE, `7` = FALSE, `8` = FALSE, `9` = FALSE,     `10` = TRUE, `11` = TRUE, `12` = TRUE, `13` = FALSE, `14` = FALSE,     `15` = FALSE, `16` = TRUE, `17` = FALSE, `18` = FALSE, `19` = FALSE,     `20` = FALSE, `21` = TRUE, `22` = FALSE, `23` = FALSE, `24` = FALSE,     `25` = FALSE), type = c(age = -3, bmi = 1, chl = 1)): There is insufficient information to impute the Heckman model at the marginal or study level.