I have tried to split a single column to three columns. But I failed. I have the following data set

> dat
 name
 Jhon Austin B 100kg
 Mick Gray C 110kg
 Tom Jef A 30kg

First I tried to extract last word using following codes

library(tidyr)

   dt<-dat %>% separate(name, into = c('name', 'pack'), sep = -6, convert = TRUE)

I got the following one

name           pack
Jhon Austin B  100kg
Mick Gray C    110kg
Tom Jef        A30kg

Where A was added with 30 kg. Though both should be in separate column. My final result should be like this

name         class   pack
Jhon Austin   B      100kg
Mick Gray     C      110kg
Tom Jef       A      30kg

I will be grateful if anyone helps me. Thanks in advance.

I have tried to split a single column to three columns. But I failed. I have the following data set

> dat
 name
 Jhon Austin B 100kg
 Mick Gray C 110kg
 Tom Jef A 30kg

First I tried to extract last word using following codes

library(tidyr)

   dt<-dat %>% separate(name, into = c('name', 'pack'), sep = -6, convert = TRUE)

I got the following one

name           pack
Jhon Austin B  100kg
Mick Gray C    110kg
Tom Jef        A30kg

Where A was added with 30 kg. Though both should be in separate column. My final result should be like this

name         class   pack
Jhon Austin   B      100kg
Mick Gray     C      110kg
Tom Jef       A      30kg

I will be grateful if anyone helps me. Thanks in advance.

Share Improve this question edited 2 days ago Edward 19.1k3 gold badges16 silver badges35 bronze badges asked 2 days ago Rokib 1277 bronze badges

Related: stackoverflow/questions/4350440/… AND stackoverflow/questions/7069076/… could split first and last name into separate columns then merge back together for full name this way. – Kelly Ireland Commented 2 days ago

Add a comment |

5 Answers 5

Sorted by: Reset to default 5

Option 1

You could try separate_wider_regex

dat %>%
    separate_wider_regex(
        name,
        patterns = c(name = ".*", " ", class = "\\w", " ", pack = "\\d+kg")
    )

Option 2

With base R, you can try sub + read.table

with(
    dat,
    setNames(
        read.table(
            text =
                sub("^(.*)\\s(\\w)\\s(\\d+.*)$", "\\1_\\2_\\3", name),
            sep = "_"
        ),
        c("name", "class", "pack")
    )
)

which gives

# A tibble: 3 × 3
  name        class  pack
  <chr>       <chr> <chr>
1 Jhon Austin B     100kg
2 Mick Gray   C     110kg
3 Tom Jef     A     30kg

data

dat <- data.frame(
    name = c(
        "Jhon Austin B 100kg",
        "Mick Gray C 110kg",
        "Tom Jef A 30kg"
    )
)

We could use the str_extract() function from the stringr library:

library(stringr)

dat$class <- str_extract(dat$name, "\\b[A-Z](?= \\d+\\w+$)")
dat$pack <- str_extract(dat$name, "\\b\\d+\\w+$")
dat$name <- str_extract(dat$name, "\\w+(?: \\w+)(?= [A-Z] \\d+\\w+$)")
dat

         name class  pack
1 Jhon Austin     B 100kg
2   Mick Gray     C 110kg
3     Tom Jef     A  30kg

To care for names with more/less than two parts, we could write a small string reverse helper function rv, then strsplit at spaces and recombine appropriately.

> rv <- \(x) {
+   strsplit(x, '') |> lapply(rev) |> sapply(paste, collapse='')
+ }
> rv(dat$name) |> sapply(strsplit, ' ') |> 
+   lapply(\(x) c(paste(x[-(1:2)], collapse=' '), x[2:1])) |> 
+   lapply(rv) |> do.call(what='rbind') |> `rownames<-`(NULL) |> 
+   as.data.frame() |> type.convert(as.is=TRUE) |> setNames(c('name', 'class', 'pack'))
            name class  pack
1    Jhon Austin     B 100kg
2      Mick Gray     C 110kg
3        Tom Jef     A  30kg
4 John F Kennedy     A  30kg
5            Foo     B  70kg

Data:

> dput(dat)
structure(list(name = c("Jhon Austin B 100kg", "Mick Gray C 110kg", 
"Tom Jef A 30kg", "John F Kennedy A 30kg", "Foo B 70kg")), class = "data.frame", row.names = c(NA, 
-5L))

You could try splitting into both forename and surname then merging back to one:

library(dplyr)
dt <- dat %>% separate(name, into = c('name', 'name2', 'class', 'pack'), sep = " ", convert = TRUE)
dt$name <- paste(dt$name, dt$name2)
# Get rid of name2
dt <- dt[, -2]

Base R, a single strsplit() + trimws(), haven't found the best pattern (yet).

strsplit(xyzzy$name, "(?=\\S+ \\S+$)", perl = TRUE) |>
  unlist() |>
  trimws() |>
  matrix(ncol = 3, byrow = TRUE) |>
  data.frame() |>
  setNames(c('name', 'class', 'pack'))

         name class  pack
1 Jhon Austin     B 100kg
2   Mick Gray     C 110kg
3     Tom Jef     A  30kg

Approach expects your data is as well anised as given. Do we really want to carry "kg" in pack. Wouldn't it be better to ignore it and make the variable's class numeric?

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

split - Splitting one column to three columns for uneven characters in r - Stack Overflow

5 Answers 5

data

与本文相关的文章

评论列表(0)