最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Extract displayed text from an hyperlink string stored in a data frame - Stack Overflow

programmeradmin0浏览0评论

I'm getting crazy with a (apparently) very simple problem.

I stored in a data frame Excel formulas as strings. Some of these strings are hyperlinks. From these hyperlink formulas, I want to extract only the displayed text and modify in place the cell of the data frame accordingly.

  • For columns whose first row contains a string with an hyperlink formula, it works for the whole column.

  • For columns whose first row doesn't contain a string with an hyperlink formula, it doesn't work for the whole column.

Here is a reprex:

DF <- data.frame(
  x = c('HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_1.xlsx]1_2016_Nodes!A3", "\'Nodes\' tab")', 'NA()'),
  y = c('NA()', 'HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_2.xlsx]2_2019_Nodes!A3", 1)'))
DF

DF[] <- lapply(DF, function(x)
  ifelse(substr(x, 1, 9) == 'HYPERLINK', trimws(gsub('[\")]', '', unlist(strsplit(x, ','))[2])), gsub('"', '', x)))
DF

I'm getting crazy with a (apparently) very simple problem.

I stored in a data frame Excel formulas as strings. Some of these strings are hyperlinks. From these hyperlink formulas, I want to extract only the displayed text and modify in place the cell of the data frame accordingly.

  • For columns whose first row contains a string with an hyperlink formula, it works for the whole column.

  • For columns whose first row doesn't contain a string with an hyperlink formula, it doesn't work for the whole column.

Here is a reprex:

DF <- data.frame(
  x = c('HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_1.xlsx]1_2016_Nodes!A3", "\'Nodes\' tab")', 'NA()'),
  y = c('NA()', 'HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_2.xlsx]2_2019_Nodes!A3", 1)'))
DF

DF[] <- lapply(DF, function(x)
  ifelse(substr(x, 1, 9) == 'HYPERLINK', trimws(gsub('[\")]', '', unlist(strsplit(x, ','))[2])), gsub('"', '', x)))
DF
Share Improve this question edited Mar 13 at 11:03 Olivier7121 asked Mar 12 at 23:29 Olivier7121Olivier7121 4251 gold badge3 silver badges14 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 4

Maybe a single sub() command can handle it.

lapply(DF, \(x) sub("HYPERLINK\\(.+, (.+)\\)", "\\1", x)) |>
  as.data.frame()

#               x    y
# 1 "'Nodes' tab" NA()
# 2          NA()    1

You could use mutate(across(everything(), ...)) to choose all the variables and then the function could use the appropriate regular expression. In the code below, I have it find "HYPERLINK" first and if it's not there, then return the data's original value. Otherwise, return what is between the comma and the last close parenthesis.

library(dplyr)
DF <- data.frame(x = c('HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_1.xlsx]1_2016_Nodes!A3", "\'Nodes\' tab")', 'NA()'), y = c('NA()', 'HYPERLINK("[Individual files/NetworkData4MassiveExport_Individual_2.xlsx]2_2019_Nodes!A3", 1)'))
DF %>% 
  mutate(across(everything(), 
                ~ifelse(grepl("HYPERLINK", .x), 
                        gsub(".*\\[.*\\].*, (.*)\\)$", "\\1", .x), 
                        .x)))
#>               x    y
#> 1 "'Nodes' tab" NA()
#> 2          NA()    1

Created on 2025-03-12 with reprex v2.1.1.9000

发布评论

评论列表(0)

  1. 暂无评论