最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Snowflake Time-Series Weighted Values - Stack Overflow

programmeradmin3浏览0评论

I have data from a PLC coming in ONLY ON CHANGE into table format (TIMESTAMP, TAG, VALUE). I have a visualisation tool (seeq) that queries this base table in snowflake and shows the data on a time series chart. If a user selects a long time-range, then this data will need to be aggregated (max 2000 points per time series plot). I want this aggregation (average) to be weighted for how long a tag has been on that value before its change. For example if I have a tag = 'cheese' and t=0 -> t=5 has a value of 100, then t=6 -> t=100 has a value of 500. If the user in seeq selects this tag, on a long period window (i.e. spans from t=0 to t=100000), the data registered from this tag has to be aggregated to (5100+95500)/100 for t=50 (mid point) on the plot. How to generate a query for this in snowflake using this base table format of TIMESTAMP, TAG, VALUE.

Tried doing a cross join to a time dimension table created from a tag dimension table then using a lead function to get every single second output from the raw data spread across every time and then weight it accordingly. It was not very performant in terms of speed.

I have data from a PLC coming in ONLY ON CHANGE into table format (TIMESTAMP, TAG, VALUE). I have a visualisation tool (seeq) that queries this base table in snowflake and shows the data on a time series chart. If a user selects a long time-range, then this data will need to be aggregated (max 2000 points per time series plot). I want this aggregation (average) to be weighted for how long a tag has been on that value before its change. For example if I have a tag = 'cheese' and t=0 -> t=5 has a value of 100, then t=6 -> t=100 has a value of 500. If the user in seeq selects this tag, on a long period window (i.e. spans from t=0 to t=100000), the data registered from this tag has to be aggregated to (5100+95500)/100 for t=50 (mid point) on the plot. How to generate a query for this in snowflake using this base table format of TIMESTAMP, TAG, VALUE.

Tried doing a cross join to a time dimension table created from a tag dimension table then using a lead function to get every single second output from the raw data spread across every time and then weight it accordingly. It was not very performant in terms of speed.

Share Improve this question asked Mar 17 at 22:15 AustinAustin 1 1
  • something like NTILE docs.snowflake/en/sql-reference/functions/ntile might be a way to chunk the data, and then do some average/weighted operation. – Simeon Pilgrim Commented Mar 17 at 22:15
Add a comment  | 

1 Answer 1

Reset to default 0

so I am not really sure what you are trying todo, but an explosive way of doing something like what you describe can be done like so:

with d0 as (
    select * from values
     ('cheese', 0, 5, 100),
     ('cheese', 6, 100, 500)
     t(tag, _s, _e, val)
 ), d1 as (
     select 
        tag,
        value::number rn,
        val,
        ntile(10) over (partition by tag order by rn) as tile,
     from d0, 
        table(flatten(array_generate_range(_s, _e+1)))
)
select
    tag,
    tile,
    avg(rn) as mid,
    avg(val) as val
from d1
group by 1,2
order by 1,2;

which gives:

TAG TILE MID VAL
cheese 1 5.000000 281.818182
cheese 2 15.500000 500.000000
cheese 3 25.500000 500.000000
cheese 4 35.500000 500.000000
cheese 5 45.500000 500.000000
cheese 6 55.500000 500.000000
cheese 7 65.500000 500.000000
cheese 8 75.500000 500.000000
cheese 9 85.500000 500.000000
cheese 10 95.500000 500.000000

those rows do not really need expanding, and the interpolation can be driven against a d0 like table, if that is how your data is sourced..

发布评论

评论列表(0)

  1. 暂无评论