I'm trying to get this vector from the rows of a data.table:
c("1, 'Alice', cast(NULL as Date), 10.5", "2, 'Bob', '2023-02-01', 20.3",
"3, 'Charlie', '2023-03-01', ''")
i.e, dates and character columns needs to be surrounded by '', everything else with a value not. NA should be '', except in the case of NA-dates when it should be a cast-statement.
data.table given by this code:
library(data.table)
# Exempel data.table
dt <- data.table(
id = 1:3,
name = c("Alice", "Bob", "Charlie"),
date = as.Date(c(NA, "2023-02-01", "2023-03-01")),
value = c(10.5, 20.3, NA)
)
But I'm a bit stumped how I accomplish that without a for loop.
EDIT: added NA to the data.table and changed the vector
EDIT2: Added an edge case, if a date column is null - I would like to replace that null with cast(NULL as Date)
I'm trying to get this vector from the rows of a data.table:
c("1, 'Alice', cast(NULL as Date), 10.5", "2, 'Bob', '2023-02-01', 20.3",
"3, 'Charlie', '2023-03-01', ''")
i.e, dates and character columns needs to be surrounded by '', everything else with a value not. NA should be '', except in the case of NA-dates when it should be a cast-statement.
data.table given by this code:
library(data.table)
# Exempel data.table
dt <- data.table(
id = 1:3,
name = c("Alice", "Bob", "Charlie"),
date = as.Date(c(NA, "2023-02-01", "2023-03-01")),
value = c(10.5, 20.3, NA)
)
But I'm a bit stumped how I accomplish that without a for loop.
EDIT: added NA to the data.table and changed the vector
EDIT2: Added an edge case, if a date column is null - I would like to replace that null with cast(NULL as Date)
Share edited Feb 11 at 7:21 ErrantBard asked Feb 10 at 14:04 ErrantBardErrantBard 1,5011 gold badge21 silver badges41 bronze badges 9- 1 Could you elaborate on the NA? Where are NAs? – jay.sf Commented Feb 10 at 14:31
- @jay.sf Absolutely - wrote that just as I was loggin out and fot to change the data.table, have updated the data.table and modified the resulting vector I would like to get. – ErrantBard Commented Feb 11 at 6:02
- 1 May I ask, what you actually want to achieve? I find not quoting numerics but quoting them if NA inconsistent. Moreover, since this would involve row-wise operations, it would considerably affect speed. – jay.sf Commented Feb 11 at 7:07
- I'm trying to generate sql-statements for an insert clause. DBI:s native functions have to many problems for or setup. – ErrantBard Commented Feb 11 at 7:18
- 1 I'd try to find out what that "something" is (check the db logs), test if I can modify the input data so that it works and provide feedback to the DBI devs. – ismirsehregal Commented Feb 11 at 10:49
3 Answers
Reset to default 3Try paste0()
in Reduce()
.
> Reduce(\(x, y) paste0(x, ', ', if (is.numeric(y)) y else sprintf('\'%s\'', y)), dt)
[1] "1, 'Alice', '2023-01-01', 10.5" "2, 'Bob', '2023-02-01', 20.3"
[3] "3, 'Charlie', '2023-03-01', 30.7"
To take NA
s into account, we could incorporate a small NA
-replace
ing function, rn()
. In the following I also set id[3]
as NA
:
> rn <- \(x) {
+ if (inherits(x, 'Date')) {
+ x <- sprintf("'%s'", as.character(x))
+ replace(x, grepl('NA', x), 'cast(NULL as Date)')
+ } else if (is.character(x) & !any(grepl("'", x))) {
+ sprintf("'%s'", x)
+ } else {
+ replace(x, is.na(x), '\' \'')
+ }
+ }
> Reduce(\(x, y) paste0(rn(x), ', ', rn(y)), dt)
[1] "1, 'Alice', '2023-01-01', 10.5" "2, 'Bob', cast(NULL as Date), 20.3"
[3] "3, 'Charlie', '2023-03-01', ' '"
Data:
> dput(dt)
structure(list(id = c(1, 2, 3), name = c("Alice", "Bob", "Charlie"
), date = structure(c(19358, NA, 19417), class = "Date"), value = c(10.5,
20.3, NA)), row.names = c(NA, -3L), class = c("data.table", "data.frame"
))
You can try:
lapply(dt, \(x) {
na.x <- is.na(x)
v <- replace(as.character(x), na.x, "")
if (is.numeric(x)) replace(v, na.x, "''") else
ifelse(na.x & inherits(x, "Date"), "cast(NULL as Date)", sQuote(v, FALSE))
}) |>
c(sep = ", ") |>
do.call(what = "paste")
[1] "1, 'Alice', cast(NULL as Date), 10.5" "2, 'Bob', '2023-02-01', 20.3"
[3] "3, 'Charlie', '2023-03-01', ''"
To do that, you can use apply with paste depending on the type. NA values will be transformed to ''
, which was specifically asked for in this question and it also handles NULL dates.
format_rows <- function(dt) {
apply(dt, 1, function(row) {
paste(mapply(function(x, t) {
if (is.na(x)) return(if(t=="Date") "cast(NULL as Date)" else "''")
if (t=="Date" || !suppressWarnings(!is.na(as.numeric(x)))) sprintf("'%s'", x) else x
}, row, sapply(dt, class)), collapse=", ")
})
}
format_rows(dt)
"1, 'Alice', cast(NULL as Date), 10.5"
"2, 'Bob', '2023-02-01', 20.3"
"3, 'Charlie', '2023-03-01', ''"
I also wanted to do it even more concise and managed to do it in two rows.
dt[, names(dt) := lapply(.SD, \(x)
if(inherits(x, "Date")) ifelse(is.na(x), "cast(NULL as Date)", sprintf("'%s'", x))
else if(is.character(x)) sprintf("'%s'", x) else x)]
dt[, paste(gsub("NA","''",.SD), collapse=", "), by=.I]$V1
"1, 'Alice', cast(NULL as Date), 10.5"
"2, 'Bob', '2023-02-01', 20.3"
"3, 'Charlie', '2023-03-01', ''"