I'm getting crazy with a (apparently) very simple problem.
I stored in a data frame Excel formulas as strings. Some of these strings are hyperlinks. From these hyperlink formulas, I want to extract only the displayed text and modify in place the cell of the data frame accordingly.
For columns whose first row contains a string with an hyperlink formula, it works for the whole column.
For columns whose first row doesn't contain a string with an hyperlink formula, it doesn't work for the whole column.
Here is a reprex:
DF <- data.frame(
x = c('HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_1.xlsx]1_2016_Nodes!A3", "\'Nodes\' tab")', 'NA()'),
y = c('NA()', 'HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_2.xlsx]2_2019_Nodes!A3", 1)'))
DF
DF[] <- lapply(DF, function(x)
ifelse(substr(x, 1, 9) == 'HYPERLINK', trimws(gsub('[\")]', '', unlist(strsplit(x, ','))[2])), gsub('"', '', x)))
DF
I'm getting crazy with a (apparently) very simple problem.
I stored in a data frame Excel formulas as strings. Some of these strings are hyperlinks. From these hyperlink formulas, I want to extract only the displayed text and modify in place the cell of the data frame accordingly.
For columns whose first row contains a string with an hyperlink formula, it works for the whole column.
For columns whose first row doesn't contain a string with an hyperlink formula, it doesn't work for the whole column.
Here is a reprex:
DF <- data.frame(
x = c('HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_1.xlsx]1_2016_Nodes!A3", "\'Nodes\' tab")', 'NA()'),
y = c('NA()', 'HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_2.xlsx]2_2019_Nodes!A3", 1)'))
DF
DF[] <- lapply(DF, function(x)
ifelse(substr(x, 1, 9) == 'HYPERLINK', trimws(gsub('[\")]', '', unlist(strsplit(x, ','))[2])), gsub('"', '', x)))
DF
Share
Improve this question
edited Mar 13 at 11:03
Olivier7121
asked Mar 12 at 23:29
Olivier7121Olivier7121
4251 gold badge3 silver badges14 bronze badges
2 Answers
Reset to default 4Maybe a single sub()
command can handle it.
lapply(DF, \(x) sub("HYPERLINK\\(.+, (.+)\\)", "\\1", x)) |>
as.data.frame()
# x y
# 1 "'Nodes' tab" NA()
# 2 NA() 1
You could use mutate(across(everything(), ...))
to choose all the variables and then the function could use the appropriate regular expression. In the code below, I have it find "HYPERLINK"
first and if it's not there, then return the data's original value. Otherwise, return what is between the comma and the last close parenthesis.
library(dplyr)
DF <- data.frame(x = c('HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_1.xlsx]1_2016_Nodes!A3", "\'Nodes\' tab")', 'NA()'), y = c('NA()', 'HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_2.xlsx]2_2019_Nodes!A3", 1)'))
DF %>%
mutate(across(everything(),
~ifelse(grepl("HYPERLINK", .x),
gsub(".*\\[.*\\].*, (.*)\\)$", "\\1", .x),
.x)))
#> x y
#> 1 "'Nodes' tab" NA()
#> 2 NA() 1
Created on 2025-03-12 with reprex v2.1.1.9000