I have a string of text and a vector of words:
String: "Auch ein blindes Huhn findet einmal ein Korn."
Vector: "auch", "ein"
I want to check how often each word in the vector is contained in the string and calculate the sum of the frequencies. For the example, the correct result would be 3.
I have come so far as to be able to check which words occur in the string and calculate the sum:
library(stringr)
deu <- c("\\bauch\\b", "\\bein\\b")
str_detect(tolower("Auch ein blindes Huhn findet einmal ein Korn."), deu)
[1] TRUE TRUE
sum(str_detect(tolower("Auch ein blindes Huhn findet einmal ein Korn."), deu))
[1] 2
Unfortunately str_detect
does not return the number of occurences (1, 2
), but only whether a word occurs in a string (TRUE, TRUE
), so the sum of the output from str_detect
is not equal to the number of words.
Is there a function in R similar to preg_match_all
in PHP?
preg_match_all("/\bauch\b|\bein\b/i", "Auch ein blindes Huhn findet einmal ein Korn.", $matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => Auch
[1] => ein
[2] => ein
)
)
echo preg_match_all("/\bauch\b|\bein\b/i", "Auch ein blindes Huhn findet einmal ein Korn.", $matches);
3
I would like to avoid loops.
I have looked at a lot of similar questions, but they either don't count the number of occurrences or do not use a vector of patterns to search. I may have overlooked a question that answers mine, but before you mark this as duplicate, please make sure that the "duplicate" actually asks the exact same thing. Thank you.
I have a string of text and a vector of words:
String: "Auch ein blindes Huhn findet einmal ein Korn."
Vector: "auch", "ein"
I want to check how often each word in the vector is contained in the string and calculate the sum of the frequencies. For the example, the correct result would be 3.
I have come so far as to be able to check which words occur in the string and calculate the sum:
library(stringr)
deu <- c("\\bauch\\b", "\\bein\\b")
str_detect(tolower("Auch ein blindes Huhn findet einmal ein Korn."), deu)
[1] TRUE TRUE
sum(str_detect(tolower("Auch ein blindes Huhn findet einmal ein Korn."), deu))
[1] 2
Unfortunately str_detect
does not return the number of occurences (1, 2
), but only whether a word occurs in a string (TRUE, TRUE
), so the sum of the output from str_detect
is not equal to the number of words.
Is there a function in R similar to preg_match_all
in PHP?
preg_match_all("/\bauch\b|\bein\b/i", "Auch ein blindes Huhn findet einmal ein Korn.", $matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => Auch
[1] => ein
[2] => ein
)
)
echo preg_match_all("/\bauch\b|\bein\b/i", "Auch ein blindes Huhn findet einmal ein Korn.", $matches);
3
I would like to avoid loops.
I have looked at a lot of similar questions, but they either don't count the number of occurrences or do not use a vector of patterns to search. I may have overlooked a question that answers mine, but before you mark this as duplicate, please make sure that the "duplicate" actually asks the exact same thing. Thank you.
Share Improve this question asked 2 days ago BenBen 49112 bronze badges 4 |4 Answers
Reset to default 5You can use str_count
like
stringr::str_count(tolower("Auch ein blindes Huhn findet mal ein Korn"), paste0("\\b", tolower(c("ein","Huhn")), "\\b"))
[1] 2 1
You could sprintf
a pattern by adding \\b
for borders and use lengths
on gregexpr
.
> vp <- v |> sprintf(fmt='\\b%s\\b') |> setNames(v) |> print()
auch ein
"\\bauch\\b" "\\bein\\b"
> lapply(vp, gregexpr, text=tolower(string)) |> unlist(recursive=FALSE) |> lengths()
auch ein
1 2
The |> print()
is just for simultaneously assigning and printing and can be removed.
Data:
string <- "Auch ein blindes Huhn findet einmal ein Korn."
v <- c("auch", "ein")
Given string and pattern like below
s <- "Auch ein blindes Huhn findet einmal ein Korn."
p <- c("auch", "ein")
you can try strsplit
+ %in%
:
- Option 1 (to get the sum of occurrences)
> sum(gsub("\\W", "", strsplit(tolower(s), " ")[[1]]) %in% p)
[1] 3
- Option 2 (use
table
if you would like to see the summary of counts)
> table(gsub("\\W", "", strsplit(tolower(s), " ")[[1]]))[p]
auch ein
1 2
Character String Processing
If base R is too complex in its syntax, I would go with {stringi}
stringi::stri_count_regex(tolower(String), sprintf('\\b%s\\b', Vector)) |>
setNames(Vector) # optional
auch ein
1 2
Data
String = 'Auch ein blindes Huhn findet einmal ein Korn.'
Vector = c('auch', 'ein')
str_count(tolower("Auch ein blindes Huhn findet mal ein Korn"), paste0("\\b", c("ein","Huhn"), "\\b"))
. See this post, which is similar stackoverflow/a/67195512/28479453 – Tim G Commented 2 days agofor i in ["auch", "ein"]: print(i + ":", "Auch ein blindes Huhn findet einmal ein Korn.".lower().split().count(i))
– Friede Commented 2 days ago