I have a long-format dataset with 3 assessment points, e.g.:
dataset = data.frame(
id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
assessment = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
scoreA = c(7,9,5,NA,5,11,2,3,9,1,NA,NA,7,NA,5),
scoreB = c(1,2,7,6,1,11,3,3,2,1,12,NA,NA,4,5)
)
I would like to remove all observations belonging to the same ID
, if there is any NA
s at assessment == 1
. For instance, the ID
's 2
and 5
have NA
s at assessment == 1
, so they should be excluded.
I'm trying to sort this out using {dplyr}
and
{tidyr}
functions, along group_by()
, filter()
, ...
I have a long-format dataset with 3 assessment points, e.g.:
dataset = data.frame(
id = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
assessment = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
scoreA = c(7,9,5,NA,5,11,2,3,9,1,NA,NA,7,NA,5),
scoreB = c(1,2,7,6,1,11,3,3,2,1,12,NA,NA,4,5)
)
I would like to remove all observations belonging to the same ID
, if there is any NA
s at assessment == 1
. For instance, the ID
's 2
and 5
have NA
s at assessment == 1
, so they should be excluded.
I'm trying to sort this out using {dplyr}
and
{tidyr}
functions, along group_by()
, filter()
, ...
- Hi, if any of the answers have solved your question, you can consider accepting the one that best meets your request by clicking the check mark. – Darren Tsai Commented Mar 20 at 2:46
4 Answers
Reset to default 5We could keep just those id's for which there are not any missing scoreA/scoreB at assessment 1:
library(dplyr)
dataset |>
filter(!any(is.na(scoreA[assessment == 1]) |
is.na(scoreB[assessment == 1])), .by = id)
id assessment scoreA scoreB
1 1 1 7 1
2 1 2 9 2
3 1 3 5 7
4 3 1 2 3
5 3 2 3 3
6 3 3 9 2
7 4 1 1 1
8 4 2 NA 12
9 4 3 NA NA
Alternatively, we could filter to just assessment 1 rows without NA's, and join that to the original data, for the same result.
dataset |>
filter(assessment == 1) |>
tidyr::drop_na() |>
# distinct(id) |> # only needed if there might be multiple assessment 1's for an id
left_join(dataset)
You can use subset
+ na.omit
like below
> subset(dataset, id %in% na.omit(subset(dataset, assessment == 1))$id)
id assessment scoreA scoreB
1 1 1 7 1
2 1 2 9 2
3 1 3 5 7
7 3 1 2 3
8 3 2 3 3
9 3 3 9 2
10 4 1 1 1
11 4 2 NA 12
12 4 3 NA NA
Or in base R you can use !complete.cases
& dataset$assessment == 1
to get the ids you want to filter out
dataset[!dataset$id %in% dataset[!complete.cases(dataset) & dataset$assessment == 1,]$id, ]
id assessment scoreA scoreB
1 1 1 7 1
2 1 2 9 2
3 1 3 5 7
7 3 1 2 3
8 3 2 3 3
9 3 3 9 2
10 4 1 1 1
11 4 2 NA 12
12 4 3 NA NA
Here is a dplyr
solution with if_any()
or pick()
. The advantage is that you can use <tidy-select> to apply the same logical function to multiple columns.
library(dplyr)
dataset %>%
filter(!any(if_any(scoreA:scoreB, is.na) & assessment == 1),
.by = id)
or
dataset %>%
filter(!any(is.na(pick(scoreA:scoreB)[assessment == 1, ])),
.by = id)
# id assessment scoreA scoreB
# 1 1 1 7 1
# 2 1 2 9 2
# 3 1 3 5 7
# 4 3 1 2 3
# 5 3 2 3 3
# 6 3 3 9 2
# 7 4 1 1 1
# 8 4 2 NA 12
# 9 4 3 NA NA