Creating 5 complete data sets from one incomplete data set in a simulation study [mice package in R]

For a study, I need to generate five complete data sets for each of the 100 incomplete data sets with the help of mice package in R.

This code is working correctly (when you have df1 dataset): df1_imp <- mice(df1, m = 5, method = 'logreg', print = F) Then, we can access the full data sets (5) produced as follows:

dataset1 <- complete(df1_imp, 1)
dataset2 <- complete(df1_imp, 2)
dataset3 <- complete(df1_imp, 3)
dataset4 <- complete(df1_imp, 4)
dataset5 <- complete(df1_imp, 5)

Fine. However, I have 100 incomplete data sets. Each will yield 5 complete data sets (500 in total). How can I view these 500 data sets? Because I'm going to analyze them.

[dfs] MY DATASET LIST (each set must produce 5 complete datasets, 3x5 = 15)

list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1, 
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA, 
1, 0, 0, 0, 1, 1, 0), dim = 6:5))

For a study, I need to generate five complete data sets for each of the 100 incomplete data sets with the help of mice package in R.

This code is working correctly (when you have df1 dataset): df1_imp <- mice(df1, m = 5, method = 'logreg', print = F) Then, we can access the full data sets (5) produced as follows:

dataset1 <- complete(df1_imp, 1)
dataset2 <- complete(df1_imp, 2)
dataset3 <- complete(df1_imp, 3)
dataset4 <- complete(df1_imp, 4)
dataset5 <- complete(df1_imp, 5)

Fine. However, I have 100 incomplete data sets. Each will yield 5 complete data sets (500 in total). How can I view these 500 data sets? Because I'm going to analyze them.

[dfs] MY DATASET LIST (each set must produce 5 complete datasets, 3x5 = 15)

list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1, 
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA, 
1, 0, 0, 0, 1, 1, 0), dim = 6:5))

Share Improve this question asked Jan 30 at 7:04 MetehanGungor 1691 silver badge12 bronze badges

1 There is no need to view the data, apart from checking imputation qc. Run analysis on all datasets, then use pool to get summary result over all the datasets. See these links: rmisstasticlify.app/tutorials/… and stackoverflow/questions/51370292/… – zx8754 Commented Jan 30 at 8:10
4 Use (eg) lapply to process each of your 100 incomplete datasets. Something like dfs_imp_all <- lapply(dfs, mice, m = 5, method = 'logreg', print = FALSE) [untested code]. dfs_imp_all will be a list of 100 elements. Each element will contain the 5 imputed datasets for the corresponding element of dfs. – Limey Commented Jan 30 at 8:27
Thank u @Limey, If you enter this as an answer, I may accept it as the correct answer. – MetehanGungor Commented Jan 31 at 8:35

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

In complete, select action='all' and include=FALSE to exclude the un-imputed dataset. For simulation studies you may want to specify a seed.

> library(mice)
> seed. <- 42
> lapply(raw_data, mice, m=5, method='pmm', seed=seed., printFlag=FALSE) |> 
+   lapply(complete, action='all', include=FALSE)
[[1]]
$`1`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  1  1
6  0  0  1  0  1

$`2`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  1  1
6  0  0  1  0  1

$`3`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  1  1
6  0  0  1  0  1

$`4`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  1  1
6  0  0  1  0  1

$`5`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  0  1
6  0  0  1  0  1

attr(,"class")
[1] "mild" "list"

[[2]]
$`1`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  0  1
5  1  1  0  0  1
6  0  0  1  1  1

$`2`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  0  1
5  1  1  0  0  1
6  0  0  1  1  1

$`3`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  0  1
5  1  1  0  0  1
6  0  0  1  1  1

$`4`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  0  1
5  1  1  0  0  1
6  0  0  1  1  1

$`5`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  1  1
5  1  1  0  0  1
6  0  0  1  1  1

attr(,"class")
[1] "mild" "list"

[[3]]
$`1`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

$`2`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

$`3`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

$`4`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

$`5`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

attr(,"class")
[1] "mild" "list"

Warning messages:
1: Number of logged events: 30 
2: Number of logged events: 30 
3: Number of logged events: 2

Notes

For a serious simulation study, you probably need to set m= somewhat higher, see an earlier answer.
In your example, imputation of the third dataset fails due to collinearities. You can investigate by setting printFlag=TRUE and not piping into complete.

Data:

> dput(raw_data)
list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1, 
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA, 
1, 0, 0, 0, 1, 1, 0), dim = 6:5))

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Creating 5 complete data sets from one incomplete data set in a simulation study [mice package in R] - Stack Overflow

1 Answer 1

Notes

与本文相关的文章

评论列表(0)