最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Generate a vector of strings from data.table values subject to certain conditions - Stack Overflow

programmeradmin1浏览0评论

I'm trying to get this vector from the rows of a data.table:

c("1, 'Alice', cast(NULL as Date), 10.5", "2, 'Bob', '2023-02-01', 20.3", 
           "3, 'Charlie', '2023-03-01', ''")

i.e, dates and character columns needs to be surrounded by '', everything else with a value not. NA should be '', except in the case of NA-dates when it should be a cast-statement.

data.table given by this code:

library(data.table)

# Exempel data.table
dt <- data.table(
  id = 1:3,
  name = c("Alice", "Bob", "Charlie"),
  date = as.Date(c(NA, "2023-02-01", "2023-03-01")),
  value = c(10.5, 20.3, NA)
)

But I'm a bit stumped how I accomplish that without a for loop.

EDIT: added NA to the data.table and changed the vector

EDIT2: Added an edge case, if a date column is null - I would like to replace that null with cast(NULL as Date)

I'm trying to get this vector from the rows of a data.table:

c("1, 'Alice', cast(NULL as Date), 10.5", "2, 'Bob', '2023-02-01', 20.3", 
           "3, 'Charlie', '2023-03-01', ''")

i.e, dates and character columns needs to be surrounded by '', everything else with a value not. NA should be '', except in the case of NA-dates when it should be a cast-statement.

data.table given by this code:

library(data.table)

# Exempel data.table
dt <- data.table(
  id = 1:3,
  name = c("Alice", "Bob", "Charlie"),
  date = as.Date(c(NA, "2023-02-01", "2023-03-01")),
  value = c(10.5, 20.3, NA)
)

But I'm a bit stumped how I accomplish that without a for loop.

EDIT: added NA to the data.table and changed the vector

EDIT2: Added an edge case, if a date column is null - I would like to replace that null with cast(NULL as Date)

Share edited Feb 11 at 7:21 ErrantBard asked Feb 10 at 14:04 ErrantBardErrantBard 1,5011 gold badge21 silver badges41 bronze badges 9
  • 1 Could you elaborate on the NA? Where are NAs? – jay.sf Commented Feb 10 at 14:31
  • @jay.sf Absolutely - wrote that just as I was loggin out and fot to change the data.table, have updated the data.table and modified the resulting vector I would like to get. – ErrantBard Commented Feb 11 at 6:02
  • 1 May I ask, what you actually want to achieve? I find not quoting numerics but quoting them if NA inconsistent. Moreover, since this would involve row-wise operations, it would considerably affect speed. – jay.sf Commented Feb 11 at 7:07
  • I'm trying to generate sql-statements for an insert clause. DBI:s native functions have to many problems for or setup. – ErrantBard Commented Feb 11 at 7:18
  • 1 I'd try to find out what that "something" is (check the db logs), test if I can modify the input data so that it works and provide feedback to the DBI devs. – ismirsehregal Commented Feb 11 at 10:49
 |  Show 4 more comments

3 Answers 3

Reset to default 3

Try paste0() in Reduce().

> Reduce(\(x, y) paste0(x, ', ', if (is.numeric(y)) y else sprintf('\'%s\'', y)), dt)
[1] "1, 'Alice', '2023-01-01', 10.5"   "2, 'Bob', '2023-02-01', 20.3"    
[3] "3, 'Charlie', '2023-03-01', 30.7"

To take NAs into account, we could incorporate a small NA-replaceing function, rn(). In the following I also set id[3] as NA:

> rn <- \(x) {
+   if (inherits(x, 'Date')) {
+     x <- sprintf("'%s'", as.character(x))
+     replace(x, grepl('NA', x), 'cast(NULL as Date)')
+   } else if (is.character(x) & !any(grepl("'", x))) {
+     sprintf("'%s'", x)
+   } else {
+     replace(x, is.na(x), '\' \'')
+   }
+ }
> Reduce(\(x, y) paste0(rn(x), ', ', rn(y)), dt)
[1] "1, 'Alice', '2023-01-01', 10.5"     "2, 'Bob', cast(NULL as Date), 20.3"
[3] "3, 'Charlie', '2023-03-01', ' '"  

Data:

> dput(dt)
structure(list(id = c(1, 2, 3), name = c("Alice", "Bob", "Charlie"
), date = structure(c(19358, NA, 19417), class = "Date"), value = c(10.5, 
20.3, NA)), row.names = c(NA, -3L), class = c("data.table", "data.frame"
))

You can try:

lapply(dt, \(x) { 
  na.x <- is.na(x)
  v <- replace(as.character(x), na.x, "")
  if  (is.numeric(x)) replace(v, na.x, "''")  else
    ifelse(na.x & inherits(x, "Date"), "cast(NULL as Date)", sQuote(v, FALSE)) 
}) |> 
  c(sep = ", ") |> 
  do.call(what = "paste")
[1] "1, 'Alice', cast(NULL as Date), 10.5" "2, 'Bob', '2023-02-01', 20.3"        
[3] "3, 'Charlie', '2023-03-01', ''"     

To do that, you can use apply with paste depending on the type. NA values will be transformed to '', which was specifically asked for in this question and it also handles NULL dates.

format_rows <- function(dt) {
  apply(dt, 1, function(row) {
    paste(mapply(function(x, t) {
      if (is.na(x)) return(if(t=="Date") "cast(NULL as Date)" else "''")
      if (t=="Date" || !suppressWarnings(!is.na(as.numeric(x)))) sprintf("'%s'", x) else x
    }, row, sapply(dt, class)), collapse=", ")
  })
}    
format_rows(dt) 
"1, 'Alice', cast(NULL as Date), 10.5" 
"2, 'Bob', '2023-02-01', 20.3"         
"3, 'Charlie', '2023-03-01', ''" 

I also wanted to do it even more concise and managed to do it in two rows.

dt[, names(dt) := lapply(.SD, \(x) 
  if(inherits(x, "Date")) ifelse(is.na(x), "cast(NULL as Date)", sprintf("'%s'", x)) 
  else if(is.character(x)) sprintf("'%s'", x) else x)]
dt[, paste(gsub("NA","''",.SD), collapse=", "), by=.I]$V1

"1, 'Alice', cast(NULL as Date), 10.5" 
"2, 'Bob', '2023-02-01', 20.3"         
"3, 'Charlie', '2023-03-01', ''"  
发布评论

评论列表(0)

  1. 暂无评论