I have table which is formatted as such.
Name | Is this a first time client | Is this the last time he comes? |
---|---|---|
Client_1 | Yes | No |
Client_2 | No | Yes |
Client_1 | No | No |
Client_1 | No | Yes |
Client_3 | Yes | No |
I have table which is formatted as such.
Name | Is this a first time client | Is this the last time he comes? |
---|---|---|
Client_1 | Yes | No |
Client_2 | No | Yes |
Client_1 | No | No |
Client_1 | No | Yes |
Client_3 | Yes | No |
I need to keep only clients for which I have the entire history - where I see the first time they came, and the last time they came. For these clients, i need to keep all of their transactions. Here, I would need to keep all the transactions for client 1.
I want to remove the others - I don't have the first transaction of client 2, and the last transaction of client 3, so they do not interest me.
I can us Excel or R for this - but I have to say I have no idea how...
Share Improve this question asked Jan 30 at 8:36 MargauxMargaux 214 bronze badges 2- Any attempts you've made yourself so far? – Excellor Commented Jan 30 at 8:56
- I've tried with a Pivot Table. I had a basic Pivot table - It showed client, the count of yes for first time deposit, and the count of yes for last time deposit. If I made the sum of these two, clients whose sum was superior to 2 meant that these were the ones I needed to keep. But I got stuck after this ! – Margaux Commented Jan 30 at 9:02
4 Answers
Reset to default 3You could use a combination of LET
and FILTER
to do it:
=LET(
Table, A2:C8,
Clients, INDEX(Table, , 1),
FirstTime, INDEX(Table, , 2),
LastTime, INDEX(Table, , 3),
ValidClients, UNIQUE(FILTER(Clients, (COUNTIFS(Clients, Clients, FirstTime, "Yes") > 0) * (COUNTIFS(Clients, Clients, LastTime, "Yes") > 0))),
FILTER(Table, ISNUMBER(MATCH(Clients, ValidClients, 0)))
)
An base R alternative might be
by(X, ~Name, subset, any(`Is this a first time client?`=='Yes') &
any(`Is this the last time he comes?`=='Yes')) |>
do.call(what='rbind') |>
`rownames<-`(NULL) # cosmetics
Name Is this a first time client? Is this the last time he comes?
1 Client_1 Yes No
2 Client_1 No No
3 Client_1 No Yes
X = structure(
list(
Name = c("Client_1", "Client_2", "Client_1", "Client_1", "Client_3"),
`Is this a first time client?` = c("Yes", "No", "No", "No", "Yes"),
`Is this the last time he comes?` = c("No", "Yes", "No", "Yes", "No")
),
class = "data.frame",
row.names = c(NA, -5L)
)
You can substitute by
with tapply
if you like.
You can try this in R like below
library(dplyr)
df %>%
filter(
any(`Is this a first time client?` == "Yes"),
any(`Is this the last time he comes?` == "Yes"),
.by = Name
)
which gives
Name Is this a first time client? Is this the last time he comes?
1 Client_1 Yes No
2 Client_1 No No
3 Client_1 No Yes
Another option is using split
> (d <- split(df, ~Name))[sapply(d, \(x) all(colMeans(x[-1] == "Yes") > 0))]
$Client_1
Name Is this a first time client? Is this the last time he comes?
1 Client_1 Yes No
3 Client_1 No No
4 Client_1 No Yes
Data
df <- structure(list(Name = c("Client_1", "Client_2", "Client_1", "Client_1",
"Client_3"), `Is this a first time client?` = c("Yes", "No",
"No", "No", "Yes"), `Is this the last time he comes?` = c("No",
"Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA,
-5L))
I'm not sure if this is overkill, but here's my attempt:
=LET(_Data,A2:C6,
_Col1, CHOOSECOLS(_Data,1),
_Col2, CHOOSECOLS(_Data,2),
_Col3, CHOOSECOLS(_Data,3),
_Uni,UNIQUE(_Col1),
_Select,BYROW(_Uni,LAMBDA(a,IF(SUM((_Col1=a)*((_Col2="yes")+(_Col3="yes")))=2,a,""))),
_Drop,FILTER(_Select,_Select<>"","EMPTY"),
_Filter,FILTER(_Data,_Col1=_Drop),
_Filter)