最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Conditionally remove missing cases in long format - Stack Overflow

programmeradmin2浏览0评论

I have a long-format dataset with 3 assessment points, e.g.:

dataset = data.frame(
  id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
  assessment = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
  scoreA = c(7,9,5,NA,5,11,2,3,9,1,NA,NA,7,NA,5),
  scoreB = c(1,2,7,6,1,11,3,3,2,1,12,NA,NA,4,5)
)

I would like to remove all observations belonging to the same ID, if there is any NAs at assessment == 1. For instance, the ID's 2 and 5 have NAs at assessment == 1, so they should be excluded.

I'm trying to sort this out using {dplyr} and {tidyr} functions, along group_by(), filter(), ...

I have a long-format dataset with 3 assessment points, e.g.:

dataset = data.frame(
  id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
  assessment = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
  scoreA = c(7,9,5,NA,5,11,2,3,9,1,NA,NA,7,NA,5),
  scoreB = c(1,2,7,6,1,11,3,3,2,1,12,NA,NA,4,5)
)

I would like to remove all observations belonging to the same ID, if there is any NAs at assessment == 1. For instance, the ID's 2 and 5 have NAs at assessment == 1, so they should be excluded.

I'm trying to sort this out using {dplyr} and {tidyr} functions, along group_by(), filter(), ...

Share Improve this question edited Mar 19 at 15:22 Darren Tsai 36.3k5 gold badges25 silver badges57 bronze badges asked Mar 18 at 22:20 DrNightDrNight 391 silver badge4 bronze badges 1
  • Hi, if any of the answers have solved your question, you can consider accepting the one that best meets your request by clicking the check mark. – Darren Tsai Commented Mar 20 at 2:46
Add a comment  | 

4 Answers 4

Reset to default 5

We could keep just those id's for which there are not any missing scoreA/scoreB at assessment 1:

library(dplyr)
dataset |>
  filter(!any(is.na(scoreA[assessment == 1]) | 
              is.na(scoreB[assessment == 1])), .by = id)

  id assessment scoreA scoreB
1  1          1      7      1
2  1          2      9      2
3  1          3      5      7
4  3          1      2      3
5  3          2      3      3
6  3          3      9      2
7  4          1      1      1
8  4          2     NA     12
9  4          3     NA     NA

Alternatively, we could filter to just assessment 1 rows without NA's, and join that to the original data, for the same result.

dataset |>
  filter(assessment == 1) |>
  tidyr::drop_na() |>
  # distinct(id) |> # only needed if there might be multiple assessment 1's for an id
  left_join(dataset)

You can use subset + na.omit like below

> subset(dataset, id %in% na.omit(subset(dataset, assessment == 1))$id)
   id assessment scoreA scoreB
1   1          1      7      1
2   1          2      9      2
3   1          3      5      7
7   3          1      2      3
8   3          2      3      3
9   3          3      9      2
10  4          1      1      1
11  4          2     NA     12
12  4          3     NA     NA

Or in base R you can use !complete.cases & dataset$assessment == 1 to get the ids you want to filter out

dataset[!dataset$id %in% dataset[!complete.cases(dataset) & dataset$assessment == 1,]$id, ]

   id assessment scoreA scoreB
1   1          1      7      1
2   1          2      9      2
3   1          3      5      7
7   3          1      2      3
8   3          2      3      3
9   3          3      9      2
10  4          1      1      1
11  4          2     NA     12
12  4          3     NA     NA

Here is a dplyr solution with if_any() or pick(). The advantage is that you can use <tidy-select> to apply the same logical function to multiple columns.

library(dplyr)

dataset %>%
  filter(!any(if_any(scoreA:scoreB, is.na) & assessment == 1),
         .by = id)

or

dataset %>%
  filter(!any(is.na(pick(scoreA:scoreB)[assessment == 1, ])),
         .by = id)
#   id assessment scoreA scoreB
# 1  1          1      7      1
# 2  1          2      9      2
# 3  1          3      5      7
# 4  3          1      2      3
# 5  3          2      3      3
# 6  3          3      9      2
# 7  4          1      1      1
# 8  4          2     NA     12
# 9  4          3     NA     NA
发布评论

评论列表(0)

  1. 暂无评论