最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Rolling count distinct in superset - Stack Overflow

programmeradmin4浏览0评论

I have a BIG NUMBER WITH TIMETREND chart in Superset which has Time grain = Quarter, create_date on temporal axes, and a metric COUNT_DISTINCT(id). The dataset is used by many other charts. I have to change the metric to do rolling count distinct to quarter instead of count distinct per quarter.

Lets say this is the source data:

create_date quarter id
2024-08-11 2024Q3 3
2024-09-15 2024Q3 1
2024-09-24 2024Q3 2
2024-10-10 2024Q4 3
2024-11-01 2024Q4 4
2024-12-11 2024Q4 3
2025-01-18 2025Q1 5
2025-01-20 2025Q1 6
2025-01-20 2025Q1 7

I have a BIG NUMBER WITH TIMETREND chart in Superset which has Time grain = Quarter, create_date on temporal axes, and a metric COUNT_DISTINCT(id). The dataset is used by many other charts. I have to change the metric to do rolling count distinct to quarter instead of count distinct per quarter.

Lets say this is the source data:

create_date quarter id
2024-08-11 2024Q3 3
2024-09-15 2024Q3 1
2024-09-24 2024Q3 2
2024-10-10 2024Q4 3
2024-11-01 2024Q4 4
2024-12-11 2024Q4 3
2025-01-18 2025Q1 5
2025-01-20 2025Q1 6
2025-01-20 2025Q1 7

Now I have this in superset:

quarter COUNT_DISTINCT(id)
2024Q3 3
2024Q4 2
2025Q1 3

And I want this to be shown:

quarter rolling COUNT_DISTINCT(id)
2024Q3 3
2024Q4 4
2025Q1 7

How can I achieve this in superset?

Thanx in advance!

Share Improve this question edited Jan 20 at 19:47 user1664961 asked Jan 20 at 19:40 user1664961user1664961 359 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

To the best of my knowledge having worked with Superset daily for the past year, none of the current features will allow this as it requires a window function which cannot be used in the visuals.

Nothing is stopping you from creating a virtual dataset in SQLLab and creating a column with a window function there.

Note: I think there is a minor error in your question as 3+2 is 5 not 4. So the result should be 3,5,8 instead of 3,4,7

    WITH grouped_ids AS 
  (SELECT COUNT(DISTINCT id) as ids, quarter FROM timeseries GROUP BY quarter)

  
SELECT quarter,
  sum(t.ids) OVER w AS total,
  count(*)    OVER w AS count
FROM   grouped_ids t
WINDOW w AS (
             ORDER BY quarter          
  )
ORDER  BY quarter

Here is a db fiddle with the implementation of the above code in PostGreSQL based off of this answer.

发布评论

评论列表(0)

  1. 暂无评论