最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

snowflake cloud data platform - Median with a sliding window - Stack Overflow

programmeradmin1浏览0评论

The goal is to use MEDIAN as a window function with a sliding window of a specific size.

SELECT *,
      MEDIAN(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM test_data
ORDER BY id;
Sliding window frame unsupported for function MEDIAN

For sample data:

CREATE OR REPLACE TEMP TABLE test_data
AS
SELECT index AS id, ABS(RANDOM(5))%100 AS n
FROM TABLE(FLATTEN(ARRAY_GENERATE_RANGE(1,10)));

The expected output for a sliding window of size 4 is:

ID N MEDIAN
0 74 74 74
1 28 51 (28+74)/2 = 51
2 0 28 28
3 18 23 (18+28)/2 = 23
4 44 23 (18+28)/2 = 23
5 57 31 (18+44)/2 = 31
6 31 37.5 (31+44)/2 = 37.5
7 27 37.5 (31+44)/2 = 37.5
8 96 44 (31+57)/2 = 44

The goal is to use MEDIAN as a window function with a sliding window of a specific size.

SELECT *,
      MEDIAN(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM test_data
ORDER BY id;
Sliding window frame unsupported for function MEDIAN

For sample data:

CREATE OR REPLACE TEMP TABLE test_data
AS
SELECT index AS id, ABS(RANDOM(5))%100 AS n
FROM TABLE(FLATTEN(ARRAY_GENERATE_RANGE(1,10)));

The expected output for a sliding window of size 4 is:

ID N MEDIAN
0 74 74 74
1 28 51 (28+74)/2 = 51
2 0 28 28
3 18 23 (18+28)/2 = 23
4 44 23 (18+28)/2 = 23
5 57 31 (18+44)/2 = 31
6 31 37.5 (31+44)/2 = 37.5
7 27 37.5 (31+44)/2 = 37.5
8 96 44 (31+57)/2 = 44

Seeking for SQL only approach.

Share Improve this question edited Mar 20 at 15:30 jonrsharpe 122k30 gold badges268 silver badges476 bronze badges asked Mar 20 at 15:28 Lukasz SzozdaLukasz Szozda 177k26 gold badges273 silver badges315 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

Median with a sliding window can be emulated by using ARRAY_AGG, ARRAY_SORT and choosing the "middle" value:

SELECT *, 
  ARRAY_SORT(ARRAY_AGG(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)) AS sorted_arr,
  CASE WHEN ARRAY_SIZE(sorted_arr) % 2 = 1 THEN sorted_arr[FLOOR(ARRAY_SIZE(sorted_arr)/2)]::FLOAT
       ELSE (sorted_arr[ARRAY_SIZE(sorted_arr)/2-1]::FLOAT + sorted_arr[ARRAY_SIZE(sorted_arr)/2]::FLOAT)/2
  END AS median
FROM test_data
ORDER BY id;

For clarity cumulative_arr and sorted_arr presented as separate steps:

SELECT *, 
  ARRAY_AGG(n) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS cumulative_arr,
  ARRAY_SORT(cumulative_arr) AS sorted_arr,
  CASE WHEN ARRAY_SIZE(sorted_arr)%2=1 THEN sorted_arr[FLOOR(ARRAY_SIZE(sorted_arr)/2)]::FLOAT
       ELSE (sorted_arr[ARRAY_SIZE(sorted_arr)/2-1]::FLOAT + sorted_arr[ARRAY_SIZE(sorted_arr)/2]::FLOAT)/2
    END AS median
FROM test_data
ORDER BY id;

Output:

发布评论

评论列表(0)

  1. 暂无评论