最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - How to select info based on different rowscriterias? - Stack Overflow

programmeradmin3浏览0评论

I have table which is formatted as such.

Name Is this a first time client Is this the last time he comes?
Client_1 Yes No
Client_2 No Yes
Client_1 No No
Client_1 No Yes
Client_3 Yes No

I have table which is formatted as such.

Name Is this a first time client Is this the last time he comes?
Client_1 Yes No
Client_2 No Yes
Client_1 No No
Client_1 No Yes
Client_3 Yes No

I need to keep only clients for which I have the entire history - where I see the first time they came, and the last time they came. For these clients, i need to keep all of their transactions. Here, I would need to keep all the transactions for client 1.

I want to remove the others - I don't have the first transaction of client 2, and the last transaction of client 3, so they do not interest me.

I can us Excel or R for this - but I have to say I have no idea how...

Share Improve this question asked Jan 30 at 8:36 MargauxMargaux 214 bronze badges 2
  • Any attempts you've made yourself so far? – Excellor Commented Jan 30 at 8:56
  • I've tried with a Pivot Table. I had a basic Pivot table - It showed client, the count of yes for first time deposit, and the count of yes for last time deposit. If I made the sum of these two, clients whose sum was superior to 2 meant that these were the ones I needed to keep. But I got stuck after this ! – Margaux Commented Jan 30 at 9:02
Add a comment  | 

4 Answers 4

Reset to default 3

You could use a combination of LET and FILTER to do it:

=LET(
    Table, A2:C8,
    Clients, INDEX(Table, , 1),
    FirstTime, INDEX(Table, , 2),
    LastTime, INDEX(Table, , 3),
    ValidClients, UNIQUE(FILTER(Clients, (COUNTIFS(Clients, Clients, FirstTime, "Yes") > 0) * (COUNTIFS(Clients, Clients, LastTime, "Yes") > 0))),
    FILTER(Table, ISNUMBER(MATCH(Clients, ValidClients, 0)))
)

An base R alternative might be

by(X, ~Name, subset, any(`Is this a first time client?`=='Yes') & 
     any(`Is this the last time he comes?`=='Yes')) |> 
  do.call(what='rbind') |> 
  `rownames<-`(NULL) # cosmetics 
      Name Is this a first time client? Is this the last time he comes?
1 Client_1                          Yes                              No
2 Client_1                           No                              No
3 Client_1                           No                             Yes

X = structure(
  list(
    Name = c("Client_1", "Client_2", "Client_1", "Client_1", "Client_3"),
    `Is this a first time client?` = c("Yes", "No", "No", "No", "Yes"),
    `Is this the last time he comes?` = c("No", "Yes", "No", "Yes", "No")
  ),
  class = "data.frame",
  row.names = c(NA, -5L)
)

You can substitute by with tapply if you like.

You can try this in R like below

library(dplyr)
df %>%
  filter(
    any(`Is this a first time client?` == "Yes"),
    any(`Is this the last time he comes?` == "Yes"),
    .by = Name
  )

which gives

      Name Is this a first time client? Is this the last time he comes?
1 Client_1                          Yes                              No
2 Client_1                           No                              No
3 Client_1                           No                             Yes

Another option is using split

> (d <- split(df, ~Name))[sapply(d, \(x) all(colMeans(x[-1] == "Yes") > 0))]
$Client_1
      Name Is this a first time client? Is this the last time he comes?
1 Client_1                          Yes                              No
3 Client_1                           No                              No
4 Client_1                           No                             Yes

Data

df <- structure(list(Name = c("Client_1", "Client_2", "Client_1", "Client_1",
"Client_3"), `Is this a first time client?` = c("Yes", "No",
"No", "No", "Yes"), `Is this the last time he comes?` = c("No",
"Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA, 
-5L))

I'm not sure if this is overkill, but here's my attempt:

=LET(_Data,A2:C6,
_Col1, CHOOSECOLS(_Data,1),
_Col2, CHOOSECOLS(_Data,2),
_Col3, CHOOSECOLS(_Data,3),
_Uni,UNIQUE(_Col1),
_Select,BYROW(_Uni,LAMBDA(a,IF(SUM((_Col1=a)*((_Col2="yes")+(_Col3="yes")))=2,a,""))),
_Drop,FILTER(_Select,_Select<>"","EMPTY"),
_Filter,FILTER(_Data,_Col1=_Drop),
_Filter)

发布评论

评论列表(0)

  1. 暂无评论