I have a list lst
of objects, say
[1,2,3,4,5,6,7,8,9]
and I want to break the list into a list()
of sublists, divided by the appearance of every item
in items
say [3,6,9]
so that I get
list(3:[1,2,3],6:[4,5,6],9:[7,8,9])
In other words, I want a function that is similar to strsplit()
but for lists instead of strings.
I have a list lst
of objects, say
[1,2,3,4,5,6,7,8,9]
and I want to break the list into a list()
of sublists, divided by the appearance of every item
in items
say [3,6,9]
so that I get
list(3:[1,2,3],6:[4,5,6],9:[7,8,9])
In other words, I want a function that is similar to strsplit()
but for lists instead of strings.
3 Answers
Reset to default 3Assuming your data are both vectors, as implied in the question:
lst <- 1:9
items <- c(3, 6, 9)
Then you can achieve your desired result using split
.
split(lst[seq(max(items))], rep(items, c(items[1], diff(items))))
#> $`3`
#> [1] 1 2 3
#>
#> $`6`
#> [1] 4 5 6
#>
#> $`9`
#> [1] 7 8 9
And to show this generalizes, let's use letters in our vector with some different indices:
lst <- letters
items <- c(8, 12, 24)
split(lst[seq(max(items))], rep(items, c(items[1], diff(items))))
#> $`8`
#> [1] "a" "b" "c" "d" "e" "f" "g" "h"
#>
#> $`12`
#> [1] "i" "j" "k" "l"
#>
#> $`24`
#> [1] "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
Created on 2025-02-14 with reprex v2.1.0
1) Let ok
be a logical vector the same length as x
indicating whether the corresponding element of x
is in y
. Then cumsum(ok)-ok+1
will provide the corresponding indexes of y
so index into y
and split
.
x
is sorted in the question but this code does impose that restriction. The elements in y
should appear in the same order that they appear in x
. Also each element of y
should appear at most once in x
; otherwise, it does not make sense to call y
a "key". See (2) later on to relax that.
This also works for x
and y
being character vectors.
x <- 1:9
y <- c(3, 6, 9)
ok <- x %in% y
split(x, y[ cumsum(ok) - ok + 1 ] )
giving
$`3`
[1] 1 2 3
$`6`
[1] 4 5 6
$`9`
[1] 7 8 9
2) If the delimiters in y
can be repeated in x
then y
is no longer a "key" vector as described in the poster's comments below the question but if we are interested in that situation anyways then replacing y[...]
with just ...
will handle that. In this scenario it no longer makes sense to use y
as the output names so we use "1", "2", "3", ... instead. This one also works even if x
does not end in an element of y
in which case the portion after the last separator in x
is included as an element in the output.
x <- c("z", "b", "a", "q", "f", "r", "a", "s", "a", "a")
y <- "a"
ok <- x %in% y
split(x, cumsum(ok) - ok + 1 )
giving
$`1`
[1] "z" "b" "a"
$`2`
[1] "q" "f" "r" "a"
$`3`
[1] "s" "a"
$`4`
[1] "a"
You can try findInterval()
. Add + 1
to the desired breaks, bc it excludes right boundary.
> split(1:9, findInterval(1:9, c(3, 6, 9) + 1))
$`0`
[1] 1 2 3
$`1`
[1] 4 5 6
$`2`
[1] 7 8 9
> breaks <- c(3, 6, 9)
> split(1:9, findInterval(1:9, c(3, 6, 9) + 1)) |> setNames(breaks)
$`3`
[1] 1 2 3
$`6`
[1] 4 5 6
$`9`
[1] 7 8 9
> split(letters, findInterval(seq_along(letters), c(7, 14, 21) + 1))
$`0`
[1] "a" "b" "c" "d" "e" "f" "g"
$`1`
[1] "h" "i" "j" "k" "l" "m" "n"
$`2`
[1] "o" "p" "q" "r" "s" "t" "u"
$`3`
[1] "v" "w" "x" "y" "z"
[1,2,3,4,5,6,7,8,9]
is not a list in R. Can you give a reproducible example please? – Axeman Commented Feb 14 at 23:18