I have a dataset in R using data.table
where each ID represents a client, and each client has multiple contracts with a start date PERIODSTART
and an end date PERIODEND
.
I need to detect overlapping periods for the same ID, meaning that if a client has two or more contracts that overlap, I want to extract those cases.
What I need is to calculate the exposure (time as a percentage over a year) for each contract period, grouped by ID and PRODUCTNUMBER. My problem is when there are dates overlapping. I don't want to calcule exposure several times.
data = data.table(ID = c(rep("customer_1000", 7), rep("customer_78", 4), rep("customer_1047", 4)),
PERIODSTART = as.Date(c("1991-02-03", "1991-11-06", "1993-02-03", "1993-03-11", "1996-02-03", "1996-11-28", "1996-11-29",
"2021-02-17", "2021-06-10", "2021-12-03", "2021-06-28",
"2021-02-17", "2021-02-17", "2021-05-02", "2021-06-28")),
PERIODEND = as.Date(c("1991-11-06", "1992-02-03", "1993-03-11", "1994-02-03", "1996-11-28", "1996-11-29", "1997-02-03",
"2021-09-30", "2021-11-09", "2021-12-20", "2021-10-14",
"2021-09-30", "2021-08-01", "2021-06-17", "2021-10-14")),
PRODUCTNUMBER = c(rep("product_1", 7),
"product_74", "product_88", "product_76", "product_25",
"product_1", "product_2", "product_3", "product_4")
)
data[, year := year(PERIODSTART)]
The calculation I want :
I have a dataset in R using data.table
where each ID represents a client, and each client has multiple contracts with a start date PERIODSTART
and an end date PERIODEND
.
I need to detect overlapping periods for the same ID, meaning that if a client has two or more contracts that overlap, I want to extract those cases.
What I need is to calculate the exposure (time as a percentage over a year) for each contract period, grouped by ID and PRODUCTNUMBER. My problem is when there are dates overlapping. I don't want to calcule exposure several times.
data = data.table(ID = c(rep("customer_1000", 7), rep("customer_78", 4), rep("customer_1047", 4)),
PERIODSTART = as.Date(c("1991-02-03", "1991-11-06", "1993-02-03", "1993-03-11", "1996-02-03", "1996-11-28", "1996-11-29",
"2021-02-17", "2021-06-10", "2021-12-03", "2021-06-28",
"2021-02-17", "2021-02-17", "2021-05-02", "2021-06-28")),
PERIODEND = as.Date(c("1991-11-06", "1992-02-03", "1993-03-11", "1994-02-03", "1996-11-28", "1996-11-29", "1997-02-03",
"2021-09-30", "2021-11-09", "2021-12-20", "2021-10-14",
"2021-09-30", "2021-08-01", "2021-06-17", "2021-10-14")),
PRODUCTNUMBER = c(rep("product_1", 7),
"product_74", "product_88", "product_76", "product_25",
"product_1", "product_2", "product_3", "product_4")
)
data[, year := year(PERIODSTART)]
The calculation I want :
Share Improve this question edited Mar 19 at 19:44 nimliug asked Mar 19 at 18:32 nimliugnimliug 4652 silver badges17 bronze badges 4 |
1 Answer
Reset to default 1Haven't coded {data.table}
in a while. This rusty code might get you started:
library(data.table)
data[order(PERIODSTART), .(start=min(PERIODSTART), stop=max(PERIODEND)),
by=.(ID, group=cumsum(c(1, tail(PERIODSTART, -1) > head(PERIODEND, -1))))][
, {
a = year(start)
b = year(stop)
y = seq(a, b)
.(
start = fifelse(y==a, start, as.Date(paste0(y, '-01-01'))),
stop = fifelse(y==b, stop, as.Date(paste0(y, '-12-31'))),
year = y
)
},
by=.(ID, group)][
, .(ID, year, start, stop, expi = round(as.integer(stop-start)/365.25, 2))]
where the first chain is quite famous. You will find it on several places on SO.
ID year start stop expi
<char> <int> <Date> <Date> <num>
1: customer_1000 1991 1991-02-03 1991-12-31 0.91
2: customer_1000 1992 1992-01-01 1992-02-03 0.09
3: customer_1000 1993 1993-02-03 1993-12-31 0.91
4: customer_1000 1994 1994-01-01 1994-02-03 0.09
5: customer_1000 1996 1996-02-03 1996-12-31 0.91
6: customer_1000 1997 1997-01-01 1997-02-03 0.09
7: customer_78 2021 2021-02-17 2021-11-09 0.73
8: customer_1047 2021 2021-02-17 2021-10-14 0.65
9: customer_78 2021 2021-12-03 2021-12-20 0.05
You might want to aggregate exp
by year
and ID
, ignoring start
and stop
in an additional step. Note. You might want to add an accurate leap year routine.
foverlaps
orIRanges
will probably help. – MrFlick Commented Mar 19 at 18:38