最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

R mice leaves missing values when I use a where-matrix - Stack Overflow

programmeradmin2浏览0评论

I have a large data frame with a lot of variables measured at three time points t1, t2 and t3. I only want to impute those missings where the according time point was answered at all, that is where answered_t1, t2 or t3 = 1. I tried to specify this using "where", but after doing so, it still leaves missing values where there should be imputations. Either I am doing something really wrong, or the mice algorithm is behaving unexpectedly. Here is a simplified example:

# Load necessary library
library(mice)

# Set seed for reproducibility
set.seed(42)

# Number of participants
n <- 100

# Answered indicators (no missing data)
answered_t1 <- rbinom(n, 1, 0.8)
answered_t2 <- rbinom(n, 1, 0.7)
answered_t3 <- rbinom(n, 1, 0.6)

# Create function to generate variable with ~20% missing data
generate_var <- function(answered) {
  var <- rnorm(n)
  var[!answered] <- NA  # Set entire time point to NA if not answered
  missing_idx <- sample(which(answered == 1), size = round(0.2 * sum(answered)))
  var[missing_idx] <- NA
  return(var)
}

# Generate variables according to answered indicators
TU1_t1 <- generate_var(answered_t1)
TU1_t2 <- generate_var(answered_t2)
TU1_t3 <- generate_var(answered_t3)

TU2_t1 <- generate_var(answered_t1)
TU2_t2 <- generate_var(answered_t2)
TU2_t3 <- generate_var(answered_t3)

# Create the data frame
df <- data.frame(
  TU1_t1, TU1_t2, TU1_t3,
  TU2_t1, TU2_t2, TU2_t3,
  answered_t1, answered_t2, answered_t3
)

# Create the predictor matrix and specify that "answered" variables are not used as predictors
pred <- make.predictorMatrix(df)
pred[, grep("answered", colnames(pred))] <- 0

# Create a "where" matrix to specify where imputation should occur
where <- is.na(df)

# Only allow imputation where the corresponding "answered" variable is 1
where[, "TU1_t1"] <- where[, "TU1_t1"] & df$answered_t1 == 1
where[, "TU1_t2"] <- where[, "TU1_t2"] & df$answered_t2 == 1
where[, "TU1_t3"] <- where[, "TU1_t3"] & df$answered_t3 == 1
where[, "TU2_t1"] <- where[, "TU2_t1"] & df$answered_t1 == 1
where[, "TU2_t2"] <- where[, "TU2_t2"] & df$answered_t2 == 1
where[, "TU2_t3"] <- where[, "TU2_t3"] & df$answered_t3 == 1

# Perform multiple imputation using mice with the "where" matrix
imp <- mice(df, m = 5, predictorMatrix = pred, where = where, printFlag = FALSE)

# Check the completed data for a case where there is a missing value that should be imputed but isn't
completed_data <- complete(imp)
print(completed_data[3, ])
where[3, ]

Does anyone have an idea?

发布评论

评论列表(0)

  1. 暂无评论