r - how to populate column based on multiple types of strings in other column

I have a data set of behavioral data. I want to assign all the different behaviors as "aggressive", "submissive", "affiliative", or leave blank in a column of the data frame.

There are multiple types of each of these behaviors. So for example "fin raise" and "fast approach" are both aggressive behaviors.

I tried this:

if (G14$Behavior == "slow approach" | "fin raise" | "fast approach" | "tail beat" | "ram" | "bite") {
    G14$`Behavioral category` <- "aggressive"
  } else if (G14$Behavior == "flee" | "avoid" | "tail quiver") {
  G14$`Behavioral category` <- "submissive"
} else if (G14$Behavior == "bump" | "join") {
  G14$`Behavioral category` <- "affiliative" 
} else {
  G14$`Behavioral category` <- ""
}

But got this error:

operations are possible only for numeric, logical or complex types

Is there anyway to do this with string characters?

I have a data set of behavioral data. I want to assign all the different behaviors as "aggressive", "submissive", "affiliative", or leave blank in a column of the data frame.

There are multiple types of each of these behaviors. So for example "fin raise" and "fast approach" are both aggressive behaviors.

I tried this:

if (G14$Behavior == "slow approach" | "fin raise" | "fast approach" | "tail beat" | "ram" | "bite") {
    G14$`Behavioral category` <- "aggressive"
  } else if (G14$Behavior == "flee" | "avoid" | "tail quiver") {
  G14$`Behavioral category` <- "submissive"
} else if (G14$Behavior == "bump" | "join") {
  G14$`Behavioral category` <- "affiliative" 
} else {
  G14$`Behavioral category` <- ""
}

But got this error:

operations are possible only for numeric, logical or complex types

Is there anyway to do this with string characters?

Share Improve this question edited 2 days ago Ben Bolker 227k26 gold badges399 silver badges492 bronze badges asked 2 days ago Kitt 411 silver badge6 bronze badges New contributor Kitt is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

2 Glad you found an answer! You may want to read What is the difference between %in% and ==? to better understand why %in% is optimal here over ==. Good luck and happy coding! – jpsmith Commented 2 days ago
It's easier to help you if you include a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Commented 2 days ago

Add a comment |

4 Answers 4

Sorted by: Reset to default 4

The answer you provided works, but this would work slightly better:

case_when(Behavior %in% c("slow approach", "fin raise", "fast approach",
                          "tail beat", "ram", "bite") ~ "aggressive",
          Behavior %in% c("flee", "avoid", "tail quiver") ~ "submissive",
     ...)

(%in% is base-R, so it will work for people who don't want to use tidyverse; matching against strings is more precise and faster than matching against regular expressions)

I was able to figure it out!! For those who experience the same problems, using the dplyr and stringr packages provide the functions case_when and str_detect. It would look something like this:

   G14 <- G14 %>% mutate(Behavioral.category =(
      case_when(
        str_detect(Behavior, "slow approach|fin raise|fast approach|bite") ~ "aggressive",
        str_detect(Behavior, "flee|avoid|tail quiver") ~ "submissive",
        str_detect(Behavior, "bump|join") ~ "affiliative"
      )
    ))

While using %in% is perhaps the appropriate solution here, you may have searched for grepl, where you can use such patterns that include '|' operators. I'd prefer using NA for non-matches, obviously it's up to you to encode remaining categories differently.

> within(G14, {
+   Behavior_cat <- NA
+   Behavior_cat[
+     grepl("slow approach|fin raise|fast approach|tail beat|ram|bite", Behavior)
+   ] <- "aggressive"
+   Behavior_cat[
+     grepl("flee|avoid|tail quiver", Behavior)
+   ] <- "submissive"
+   Behavior_cat[
+     grepl("bump|join", Behavior)
+   ] <- 'affiliative'
+ })
          Behavior Behavior_cat
1    slow approach   aggressive
2        fin raise   aggressive
3    fast approach   aggressive
4        tail beat   aggressive
5              ram   aggressive
6             bite   aggressive
7             flee   submissive
8            avoid   submissive
9      tail quiver   submissive
10            bump  affiliative
11            join  affiliative
12 random behavior         <NA>

Here's an alternative solution using stringi::stri_replace_all_regex:

> G14 |> 
+   transform(
+     Behavior_cat=stringi::stri_replace_all_regex(
+       Behavior,
+       list(c('slow approach|fin raise|fast approach|tail beat|ram|bite'),
+            c('flee|avoid|tail quiver'),
+            c('bump|join'), c('random behavior')),
+       list('aggressive', 'submissive', 'affiliative', NA_character_),
+       vectorize_all=FALSE)
+   )
          Behavior Behavior_cat
1    slow approach   aggressive
2        fin raise   aggressive
3    fast approach   aggressive
4        tail beat   aggressive
5              ram   aggressive
6             bite   aggressive
7             flee   submissive
8            avoid   submissive
9      tail quiver   submissive
10            bump  affiliative
11            join  affiliative
12 random behavior         <NA>

Note, that these also match word parts so far. To only match whole words, include boundary metacharacters, or ^ and $ to denote start and end of a pattern, as shown e.g. in this answer.

Data:

> dput(G14)
structure(list(Behavior = c("slow approach", "fin raise", "fast approach", 
"tail beat", "ram", "bite", "flee", "avoid", "tail quiver", "bump", 
"join", "random behavior"), Behavior_cat = c("aggressive", "aggressive", 
"aggressive", "aggressive", "aggressive", "aggressive", "aggressive", 
"aggressive", "aggressive", "aggressive", "aggressive", "aggressive"
)), row.names = c(NA, -12L), class = "data.frame")

1) match_case We can use case_match from dplyr. It takes a first argument which is a vector containing codes followed by arguments which are formulas with the possible codes on the left hand side and the replacements on the right.

library(dplyr)

G14 %>% 
  mutate(Behavioral.category = case_match(Behavior,
    c("slow approach", "fin raise", "fast approach", "bite") ~ "aggressive",
    c("flee", "avoid", "tail quiver") ~ "submissive",
    c("bump", "join") ~ "affiliative")
  )

giving the following using the input in the Note at the end

       Behavior Behavioral.category
1 slow approach          aggressive
2     fin raise          aggressive
3 fast approach          aggressive
4          bite          aggressive
5          flee          submissive
6         avoid          submissive
7   tail quiver          submissive
8          bump         affiliative
9          join         affiliative

2) fct_collapse First create a list L whose names are the replacement codes and whose values are vectors of existing codes and then use that with fct_collapse.

library(dplyr)
library(forcats)

L <- list(
  aggressive = c("slow approach", "fin raise", "fast approach", "bite"),
  submissive = c("flee", "avoid", "tail quiver"),
  affiliative = c("bump", "join")
)

G14 %>% mutate(Behavior.category = fct_collapse(Behavior, !!!L))

3) left_join We can also use left_join with L defined above.

library(dplyr)

G14 %>%
  left_join(stack(L), join_by(Behavior == values)) %>% 
  rename(Behavior.Category = ind)

4) Base R Using match with L from above we can obtain a Base R approach.

stk <- stack(L)
G14 |> transform(Behavior.category = stk$ind[match(Behavior, stk$values)])

Note

Input data used

G14 <- data.frame(Behavior = c("slow approach", "fin raise", "fast approach",
  "bite", "flee", "avoid", "tail quiver", "bump", "join"))

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

r - how to populate column based on multiple types of strings in other column - Stack Overflow

4 Answers 4

Note

与本文相关的文章

评论列表(0)