最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Creating 5 complete data sets from one incomplete data set in a simulation study [mice package in R] - Stack Overflow

programmeradmin1浏览0评论

For a study, I need to generate five complete data sets for each of the 100 incomplete data sets with the help of mice package in R.

This code is working correctly (when you have df1 dataset): df1_imp <- mice(df1, m = 5, method = 'logreg', print = F) Then, we can access the full data sets (5) produced as follows:

dataset1 <- complete(df1_imp, 1)
dataset2 <- complete(df1_imp, 2)
dataset3 <- complete(df1_imp, 3)
dataset4 <- complete(df1_imp, 4)
dataset5 <- complete(df1_imp, 5)

Fine. However, I have 100 incomplete data sets. Each will yield 5 complete data sets (500 in total). How can I view these 500 data sets? Because I'm going to analyze them.

[dfs] MY DATASET LIST (each set must produce 5 complete datasets, 3x5 = 15)

list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1, 
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA, 
1, 0, 0, 0, 1, 1, 0), dim = 6:5))

For a study, I need to generate five complete data sets for each of the 100 incomplete data sets with the help of mice package in R.

This code is working correctly (when you have df1 dataset): df1_imp <- mice(df1, m = 5, method = 'logreg', print = F) Then, we can access the full data sets (5) produced as follows:

dataset1 <- complete(df1_imp, 1)
dataset2 <- complete(df1_imp, 2)
dataset3 <- complete(df1_imp, 3)
dataset4 <- complete(df1_imp, 4)
dataset5 <- complete(df1_imp, 5)

Fine. However, I have 100 incomplete data sets. Each will yield 5 complete data sets (500 in total). How can I view these 500 data sets? Because I'm going to analyze them.

[dfs] MY DATASET LIST (each set must produce 5 complete datasets, 3x5 = 15)

list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1, 
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA, 
1, 0, 0, 0, 1, 1, 0), dim = 6:5))
Share Improve this question asked Jan 30 at 7:04 MetehanGungorMetehanGungor 1691 silver badge12 bronze badges 3
  • 1 There is no need to view the data, apart from checking imputation qc. Run analysis on all datasets, then use pool to get summary result over all the datasets. See these links: rmisstasticlify.app/tutorials/… and stackoverflow/questions/51370292/… – zx8754 Commented Jan 30 at 8:10
  • 4 Use (eg) lapply to process each of your 100 incomplete datasets. Something like dfs_imp_all <- lapply(dfs, mice, m = 5, method = 'logreg', print = FALSE) [untested code]. dfs_imp_all will be a list of 100 elements. Each element will contain the 5 imputed datasets for the corresponding element of dfs. – Limey Commented Jan 30 at 8:27
  • Thank u @Limey, If you enter this as an answer, I may accept it as the correct answer. – MetehanGungor Commented Jan 31 at 8:35
Add a comment  | 

1 Answer 1

Reset to default 0

In complete, select action='all' and include=FALSE to exclude the un-imputed dataset. For simulation studies you may want to specify a seed.

> library(mice)
> seed. <- 42
> lapply(raw_data, mice, m=5, method='pmm', seed=seed., printFlag=FALSE) |> 
+   lapply(complete, action='all', include=FALSE)
[[1]]
$`1`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  1  1
6  0  0  1  0  1

$`2`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  1  1
6  0  0  1  0  1

$`3`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  1  1
6  0  0  1  0  1

$`4`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  1  1
6  0  0  1  0  1

$`5`
  V1 V2 V3 V4 V5
1  1  1  0  0  0
2  0  0  0  1  1
3  0  1  1  0  1
4  1  1  0  1  1
5  0  0  1  0  1
6  0  0  1  0  1

attr(,"class")
[1] "mild" "list"

[[2]]
$`1`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  0  1
5  1  1  0  0  1
6  0  0  1  1  1

$`2`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  0  1
5  1  1  0  0  1
6  0  0  1  1  1

$`3`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  0  1
5  1  1  0  0  1
6  0  0  1  1  1

$`4`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  0  1
5  1  1  0  0  1
6  0  0  1  1  1

$`5`
  V1 V2 V3 V4 V5
1  1  0  0  1  0
2  1  0  0  0  1
3  0  0  1  1  1
4  1  0  1  1  1
5  1  1  0  0  1
6  0  0  1  1  1

attr(,"class")
[1] "mild" "list"

[[3]]
$`1`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

$`2`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

$`3`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

$`4`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

$`5`
  V1 V2 V3 V4 V5
1  1  1  0 NA  0
2  0  0  0  1  0
3  1  1  1  0  0
4  0  0  1  1  1
5  0  0  1 NA  1
6  0  0  0  1  0

attr(,"class")
[1] "mild" "list"

Warning messages:
1: Number of logged events: 30 
2: Number of logged events: 30 
3: Number of logged events: 2 

Notes

  1. For a serious simulation study, you probably need to set m= somewhat higher, see an earlier answer.
  2. In your example, imputation of the third dataset fails due to collinearities. You can investigate by setting printFlag=TRUE and not piping into complete.

Data:

> dput(raw_data)
list(structure(c(1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
0, 1, 1, 0, 1, NA, 1, NA, 0, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 
1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 
NA, 0, 1, 0, 1, 1, 1, 1, 1), dim = 6:5), structure(c(1, 0, 1, 
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, NA, 1, 0, 1, NA, 
1, 0, 0, 0, 1, 1, 0), dim = 6:5))

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论