Seeing What’s Missing

visualization of incomplete and imputed data

Hanne Oberman

Utrecht University

NAs in R

c(1, 3, NA) |> mean()
[1] NA


c(1, 3, NA) |> mean(na.rm = TRUE)
[1] 2


Warning

Warning message: Removed 1 row containing missing values or values outside the scale range (geom_point()).

Ignoring NAs


Ignoring NAs

Missing data are…









Missing data are…


a pervasive problem







Missing data are…


a pervasive problem




visualizable

&

analyzable

Missing data are…


a pervasive problem


informative!


visualizable

&

analyzable

R set-up

library(dplyr)
library(ggplot2)
library(mice)
library(ggmice)
set.seed(123)

Data generation

n <- 20
dat <- tibble(
  r_use = sample(1:5, size = n, replace = TRUE),
  happy = rnorm(n, mean = 6) + 0.5 * r_use
)

Data generation

n <- 20
dat <- tibble(
  r_use = sample(1:5, size = n, replace = TRUE),
  happy = rnorm(n, mean = 6) + 0.5 * r_use
)

Complete data

ggmice(dat, aes(r_use, happy)) + 
  geom_point(size = 3) 

Incomplete data

dat <- rbind(dat, c(r_use = 7, happy = NA))
ggmice(dat, aes(r_use, happy)) + 
  geom_point(size = 3)

Plausible values

ggmice(dat, aes(r_use, happy)) + 
  geom_point(size = 3) +
  geom_smooth(method = "lm", fullrange = TRUE)

Regression imputation

imp <- mice(dat, meth = "norm.predict", print = FALSE)
ggmice(imp, aes(r_use, happy)) + 
  geom_point(size = 3) +
  geom_smooth(method = "lm", fullrange = TRUE)

Multiple imputation

imp <- mice(dat, meth = "norm", print = FALSE)
ggmice(imp, aes(r_use, happy)) + 
  geom_point(size = 3) +
  geom_smooth(method = "lm", fullrange = TRUE)

Informative missingness


What is your gender?

Missing data are…


a pervasive problem


informative!


visualizable

&

analyzable

Thank you!