最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - How to count rows between two IDs in postgreSQL without grouping by - Stack Overflow

programmeradmin1浏览0评论

I am working with a dataset that tracks folder movements within conversations. I have a folder_movement table that contains conversation_id, updated_at, and movement details.

I created a folder_cum table where I extracted conversation_history_id_fp (starting point) and conversation_history_id_endp (ending point) for each interaction (1 Interaction = 1 Entrie and 1 exit from CE triage). Now, I need to count how many folder movements exist between conversation_history_id_fp and conversation_history_id_endp, while maintaining order by conversation_id and updated_at.

The challenge is that I cannot group by conversation_id, since a single conversation can have multiple interactions with different folder movements. Instead, I need to count the rows that fall between the two extracted IDs in the folder_movement table, and this table must be ordered by conversation_id and updated_at.

WITH
  DATA AS (
    SELECT
      conversation_history_id,
      conversation_id,
      convo.created_at AT time zone 'cst' AS updated_at,
      convo.user_id,
      convo.conversation_folder_name AS folder_to,
      LAG(convo.conversation_folder_name, 1) OVER (
        ORDER BY
          convo.conversation_id, convo.created_at ASC
      ) folder_from,
      convo.status,
      LAG(convo.status, 1) OVER (
        ORDER BY
          convo.conversation_id,convo.created_at ASC
      ) previous_status,
      convo.snooze_until,
      convo.snooze_note
    
    FROM conversation_history convo
    
    ORDER BY
      conversation_id,updated_at DESC
      --limit 1000
  ),
  folder_movement as (
SELECT

  conversation_history_id,
  conversation_id,
  updated_at,
  folder_from,
  folder_to,
  user_id

FROM DATA

WHERE folder_to <> folder_from
  ),
 
folder_cum as (
SELECT

  conversation_history_id as conversation_history_id_fp, -- start point
  conversation_id,
  updated_at,
  folder_from,
  folder_to,
  user_id,

LEAD(conversation_id, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) conversation_id_check,
  
  LEAD(updated_at, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) close_at,
  
  LEAD(user_id, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) closed_by,
      
  LEAD(folder_from, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) last_interaction_folder,

  LEAD(conversation_history_id, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) conversation_history_id_endp


FROM folder_movement

WHERE (folder_from = '/CE triage/') or
(folder_to = '/CE triage/')
order by updated_at desc
),
interactions_number as ( -- help here please --
SELECT
    fp.conversation_history_id_fp,
    endp.conversation_history_id_endp,
    COUNT(folder_movement.conversation_history_id) as mov_count
  FROM folder_movement
  LEFT JOIN folder_cum fp on folder_movement.conversation_history_id = fp.conversation_history_id_fp
  LEFT JOIN folder_cum endp on folder_movement.conversation_history_id = endp.conversation_history_id_endp
  GROUP BY fp.conversation_history_id_fp, endp.conversation_history_id_endp
),

prueba as (
SELECT

  folder_cum.conversation_id,
  folder_from,
  updated_at as open_at,
  
  case
    when user_id = 5782129474346356288 then true --software id
    else false 
    end as bot, --True = chatbot movement 
  folder_to as first_interaction_folder,
  

 CASE 
    WHEN folder_cum.conversation_id <> conversation_id_check then null --If it is different,  --it means the interaction is not closed yet, so it should be NULL
    else mov_count end as mov_count,
 
 
 CASE 
    WHEN folder_cum.conversation_id <> conversation_id_check then null
    else close_at end as close_at,
 
 CASE 
    WHEN folder_cum.conversation_id <> conversation_id_check then null
    else closed_by end as closed_by,
  
 CASE 
    WHEN folder_cum.conversation_id <> conversation_id_check then null
    else last_interaction_folder end as last_interaction_folder

FROM folder_cum
LEFT JOIN interactions_number on folder_cum.conversation_history_id_fp = interactions_number.conversation_history_id_fp
where (folder_from = '/CE triage/') 
and (updated_at is not null)
)

select 

  conversation_id,
  folder_from,
  open_at,
  bot,
  first_interaction_folder,
 mov_count,
 close_at,
 bright_user.email as closed_by,
 last_interaction_folder
 
from prueba 
left join bright_user on bright_user.user_id = prueba.closed_by
--where conversation_id = '4442463178442263265' --test
ORDER BY open_at desc

Here’s a simplified version of my tables:

folder_movement (Extracted from DATA)

conversation_history_id conversation_id updated_at folder_from folder_to
01 A1 2024-01-03 10:49:00 Folder 2 CE Triage
02 A1 2024-01-03 10:42:00 Folder 1 Folder 2
03 A1 2024-01-03 10:41:00 CE Triage Folder 1
08 A1 2023-01-03 13:45:00 Folder 4 CE Triage
09 A1 2023-01-03 13:41:00 CE Triage Folder 4
04 A2 2024-01-02 12:47:00 Folder 1 CE Triage
05 A2 2024-01-02 12:45:00 Folder 3 Folder 1
06 A2 2024-01-02 12:42:00 CE Triage Folder 3
07 A3 2024-01-02 12:52:00 CE Triage Folder 3

I am working with a dataset that tracks folder movements within conversations. I have a folder_movement table that contains conversation_id, updated_at, and movement details.

I created a folder_cum table where I extracted conversation_history_id_fp (starting point) and conversation_history_id_endp (ending point) for each interaction (1 Interaction = 1 Entrie and 1 exit from CE triage). Now, I need to count how many folder movements exist between conversation_history_id_fp and conversation_history_id_endp, while maintaining order by conversation_id and updated_at.

The challenge is that I cannot group by conversation_id, since a single conversation can have multiple interactions with different folder movements. Instead, I need to count the rows that fall between the two extracted IDs in the folder_movement table, and this table must be ordered by conversation_id and updated_at.

WITH
  DATA AS (
    SELECT
      conversation_history_id,
      conversation_id,
      convo.created_at AT time zone 'cst' AS updated_at,
      convo.user_id,
      convo.conversation_folder_name AS folder_to,
      LAG(convo.conversation_folder_name, 1) OVER (
        ORDER BY
          convo.conversation_id, convo.created_at ASC
      ) folder_from,
      convo.status,
      LAG(convo.status, 1) OVER (
        ORDER BY
          convo.conversation_id,convo.created_at ASC
      ) previous_status,
      convo.snooze_until,
      convo.snooze_note
    
    FROM conversation_history convo
    
    ORDER BY
      conversation_id,updated_at DESC
      --limit 1000
  ),
  folder_movement as (
SELECT

  conversation_history_id,
  conversation_id,
  updated_at,
  folder_from,
  folder_to,
  user_id

FROM DATA

WHERE folder_to <> folder_from
  ),
 
folder_cum as (
SELECT

  conversation_history_id as conversation_history_id_fp, -- start point
  conversation_id,
  updated_at,
  folder_from,
  folder_to,
  user_id,

LEAD(conversation_id, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) conversation_id_check,
  
  LEAD(updated_at, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) close_at,
  
  LEAD(user_id, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) closed_by,
      
  LEAD(folder_from, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) last_interaction_folder,

  LEAD(conversation_history_id, 1) OVER (
        ORDER BY
          conversation_id, updated_at asc
      ) conversation_history_id_endp


FROM folder_movement

WHERE (folder_from = '/CE triage/') or
(folder_to = '/CE triage/')
order by updated_at desc
),
interactions_number as ( -- help here please --
SELECT
    fp.conversation_history_id_fp,
    endp.conversation_history_id_endp,
    COUNT(folder_movement.conversation_history_id) as mov_count
  FROM folder_movement
  LEFT JOIN folder_cum fp on folder_movement.conversation_history_id = fp.conversation_history_id_fp
  LEFT JOIN folder_cum endp on folder_movement.conversation_history_id = endp.conversation_history_id_endp
  GROUP BY fp.conversation_history_id_fp, endp.conversation_history_id_endp
),

prueba as (
SELECT

  folder_cum.conversation_id,
  folder_from,
  updated_at as open_at,
  
  case
    when user_id = 5782129474346356288 then true --software id
    else false 
    end as bot, --True = chatbot movement 
  folder_to as first_interaction_folder,
  

 CASE 
    WHEN folder_cum.conversation_id <> conversation_id_check then null --If it is different,  --it means the interaction is not closed yet, so it should be NULL
    else mov_count end as mov_count,
 
 
 CASE 
    WHEN folder_cum.conversation_id <> conversation_id_check then null
    else close_at end as close_at,
 
 CASE 
    WHEN folder_cum.conversation_id <> conversation_id_check then null
    else closed_by end as closed_by,
  
 CASE 
    WHEN folder_cum.conversation_id <> conversation_id_check then null
    else last_interaction_folder end as last_interaction_folder

FROM folder_cum
LEFT JOIN interactions_number on folder_cum.conversation_history_id_fp = interactions_number.conversation_history_id_fp
where (folder_from = '/CE triage/') 
and (updated_at is not null)
)

select 

  conversation_id,
  folder_from,
  open_at,
  bot,
  first_interaction_folder,
 mov_count,
 close_at,
 bright_user.email as closed_by,
 last_interaction_folder
 
from prueba 
left join bright_user on bright_user.user_id = prueba.closed_by
--where conversation_id = '4442463178442263265' --test
ORDER BY open_at desc

Here’s a simplified version of my tables:

folder_movement (Extracted from DATA)

conversation_history_id conversation_id updated_at folder_from folder_to
01 A1 2024-01-03 10:49:00 Folder 2 CE Triage
02 A1 2024-01-03 10:42:00 Folder 1 Folder 2
03 A1 2024-01-03 10:41:00 CE Triage Folder 1
08 A1 2023-01-03 13:45:00 Folder 4 CE Triage
09 A1 2023-01-03 13:41:00 CE Triage Folder 4
04 A2 2024-01-02 12:47:00 Folder 1 CE Triage
05 A2 2024-01-02 12:45:00 Folder 3 Folder 1
06 A2 2024-01-02 12:42:00 CE Triage Folder 3
07 A3 2024-01-02 12:52:00 CE Triage Folder 3

interactions_number (Expected)

conversation_history_id_endp conversation_history_id_fp conversation_id mov_count
03 01 A1 3
09 08 A1 2
06 04 A2 3
07 06 A3 2

(Conversation ID A3, since the interaction is not finalized (folder_to = 'CE Triage'), will share the conversation_history_id from the immediately preceding row. This does not matter because, later in the code, the prueba table filters out unfinished interactions. Check the comments in first case in prueba)

I am expecting to get a count of all folder movements that occurred between conversation_history_id_fp and conversation_history_id_endp for each interaction, while ensuring the order is preserved (conversation_id and updated_at)

Share Improve this question edited Feb 8 at 13:29 Guillaume Outters 1,1608 silver badges20 bronze badges asked Feb 6 at 22:53 TreeCatTreeCat 112 bronze badges New contributor TreeCat is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 1
  • In case conversation A1 gets 2 subsequent roadtrips (enters Triage, exits from Triage, reenters Triage, exits Triage, but without an A2 inbetween): do you expect 1 interaction, or 2? In other words: is it the roundtrip to Triage that defines an interaction, or the interruption of a conversation by another one? – Guillaume Outters Commented Feb 7 at 14:13
Add a comment  | 

1 Answer 1

Reset to default 0

Creating interactions_number

You did a majority of the hard work in your folder_cum table, because:

  • your criteria ("either from or to Triage") limited folder_cum to contain only 2 entries (the start and end of the interaction)
  • you used window functions so that the start of interaction now knows data from the end of interaction

So, now that the start of interaction has all necessary data, you can work with it only, which makes 1 row per (start of) conversation as long as you can choose the right one (to this effect we refilter on folder_from = '/CE triage/', which helps us distinguish the start from the end), and thus you can GROUP BY on its fields.

Then classically JOIN to folder_movement, and COUNT to get your per-interaction count:

select
    f.conversation_history_id_fp,
    f.conversation_history_id_endp,
    f.conversation_id,
    count(1) mov_count
from folder_cum f
left join folder_movement m on m.conversation_history_id between conversation_history_id_endp and conversation_history_id_fp
where f.folder_from = '/CE triage/'
group by 1, 2, 3

Correcting folder_cum

Beware though, that your window functions in folder_cum should use a PARTITION BY to get the desired effect.
By using ORDER BY conversation_id, updated_at only, you risk interleaved conversations to get the wrong data (A1 start A2 start A1 end A2 end: A2 start will get its LEAD in A1 end instead of A2 end; or A3 getting A1 data as you noticed).
In the other hand, by using PARTITION BY conversation_id ORDER BY updated_at, you ensure that subsequent entries will always be related to the same conversation, and that the ending, unclosed conversation will get NULL instead of unrelated data.

All in all

With those two changes, you get the expected result as seen in SQLFiddle.

You even get a correct A3 conversation, ending in NULL and having 1 interaction only.

And note that I user the shorthand WINDOW same_conv AS to avoid repeating the window definition for each one of its uses (and avoid copy-paste mistakes).

发布评论

评论列表(0)

  1. 暂无评论