For a study, I need to generate five complete data sets for each of the 100 incomplete data sets with the help of mice
package in R.
This code is working correctly (when you have df1
dataset):
df1_imp <- mice(df1, m = 5, method = 'logreg', print = F)
Then, we can access the full data sets (5) produced as follows:
dataset1 <- complete(df1_imp, 1)
dataset2 <- complete(df1_imp, 2)
dataset3 <- complete(df1_imp, 3)
dataset4 <- complete(df1_imp, 4)
dataset5 <- complete(df1_imp, 5)
Fine. However, I have 100 incomplete data sets. Each will yield 5 complete data sets (500 in total). How can I view these 500 data sets? Because I'm going to analyze them.
[dfs] MY DATASET LIST (each set must produce 5 complete datasets, 3x5 = 15)
list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1,
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA,
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA,
1, 0, 0, 0, 1, 1, 0), dim = 6:5))
For a study, I need to generate five complete data sets for each of the 100 incomplete data sets with the help of mice
package in R.
This code is working correctly (when you have df1
dataset):
df1_imp <- mice(df1, m = 5, method = 'logreg', print = F)
Then, we can access the full data sets (5) produced as follows:
dataset1 <- complete(df1_imp, 1)
dataset2 <- complete(df1_imp, 2)
dataset3 <- complete(df1_imp, 3)
dataset4 <- complete(df1_imp, 4)
dataset5 <- complete(df1_imp, 5)
Fine. However, I have 100 incomplete data sets. Each will yield 5 complete data sets (500 in total). How can I view these 500 data sets? Because I'm going to analyze them.
[dfs] MY DATASET LIST (each set must produce 5 complete datasets, 3x5 = 15)
list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1,
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA,
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA,
1, 0, 0, 0, 1, 1, 0), dim = 6:5))
Share
Improve this question
asked Jan 30 at 7:04
MetehanGungorMetehanGungor
1691 silver badge12 bronze badges
3
|
1 Answer
Reset to default 0In complete
, select action='all'
and include=FALSE
to exclude the un-imputed dataset. For simulation studies you may want to specify a seed
.
> library(mice)
> seed. <- 42
> lapply(raw_data, mice, m=5, method='pmm', seed=seed., printFlag=FALSE) |>
+ lapply(complete, action='all', include=FALSE)
[[1]]
$`1`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 1 1
6 0 0 1 0 1
$`2`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 1 1
6 0 0 1 0 1
$`3`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 1 1
6 0 0 1 0 1
$`4`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 1 1
6 0 0 1 0 1
$`5`
V1 V2 V3 V4 V5
1 1 1 0 0 0
2 0 0 0 1 1
3 0 1 1 0 1
4 1 1 0 1 1
5 0 0 1 0 1
6 0 0 1 0 1
attr(,"class")
[1] "mild" "list"
[[2]]
$`1`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 0 1
5 1 1 0 0 1
6 0 0 1 1 1
$`2`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 0 1
5 1 1 0 0 1
6 0 0 1 1 1
$`3`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 0 1
5 1 1 0 0 1
6 0 0 1 1 1
$`4`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 0 1
5 1 1 0 0 1
6 0 0 1 1 1
$`5`
V1 V2 V3 V4 V5
1 1 0 0 1 0
2 1 0 0 0 1
3 0 0 1 1 1
4 1 0 1 1 1
5 1 1 0 0 1
6 0 0 1 1 1
attr(,"class")
[1] "mild" "list"
[[3]]
$`1`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
$`2`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
$`3`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
$`4`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
$`5`
V1 V2 V3 V4 V5
1 1 1 0 NA 0
2 0 0 0 1 0
3 1 1 1 0 0
4 0 0 1 1 1
5 0 0 1 NA 1
6 0 0 0 1 0
attr(,"class")
[1] "mild" "list"
Warning messages:
1: Number of logged events: 30
2: Number of logged events: 30
3: Number of logged events: 2
Notes
- For a serious simulation study, you probably need to set
m=
somewhat higher, see an earlier answer. - In your example, imputation of the third dataset fails due to collinearities. You can investigate by setting
printFlag=TRUE
and not piping intocomplete
.
Data:
> dput(raw_data)
list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1,
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA,
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA,
1, 0, 0, 0, 1, 1, 0), dim = 6:5))
lapply
to process each of your 100 incomplete datasets. Something likedfs_imp_all <- lapply(dfs, mice, m = 5, method = 'logreg', print = FALSE)
[untested code].dfs_imp_all
will be a list of 100 elements. Each element will contain the 5 imputed datasets for the corresponding element ofdfs
. – Limey Commented Jan 30 at 8:27