I have data from a PLC coming in ONLY ON CHANGE into table format (TIMESTAMP, TAG, VALUE). I have a visualisation tool (seeq) that queries this base table in snowflake and shows the data on a time series chart. If a user selects a long time-range, then this data will need to be aggregated (max 2000 points per time series plot). I want this aggregation (average) to be weighted for how long a tag has been on that value before its change. For example if I have a tag = 'cheese' and t=0 -> t=5 has a value of 100, then t=6 -> t=100 has a value of 500. If the user in seeq selects this tag, on a long period window (i.e. spans from t=0 to t=100000), the data registered from this tag has to be aggregated to (5100+95500)/100 for t=50 (mid point) on the plot. How to generate a query for this in snowflake using this base table format of TIMESTAMP, TAG, VALUE.
Tried doing a cross join to a time dimension table created from a tag dimension table then using a lead function to get every single second output from the raw data spread across every time and then weight it accordingly. It was not very performant in terms of speed.
I have data from a PLC coming in ONLY ON CHANGE into table format (TIMESTAMP, TAG, VALUE). I have a visualisation tool (seeq) that queries this base table in snowflake and shows the data on a time series chart. If a user selects a long time-range, then this data will need to be aggregated (max 2000 points per time series plot). I want this aggregation (average) to be weighted for how long a tag has been on that value before its change. For example if I have a tag = 'cheese' and t=0 -> t=5 has a value of 100, then t=6 -> t=100 has a value of 500. If the user in seeq selects this tag, on a long period window (i.e. spans from t=0 to t=100000), the data registered from this tag has to be aggregated to (5100+95500)/100 for t=50 (mid point) on the plot. How to generate a query for this in snowflake using this base table format of TIMESTAMP, TAG, VALUE.
Tried doing a cross join to a time dimension table created from a tag dimension table then using a lead function to get every single second output from the raw data spread across every time and then weight it accordingly. It was not very performant in terms of speed.
Share Improve this question asked Mar 17 at 22:15 AustinAustin 1 1- something like NTILE docs.snowflake/en/sql-reference/functions/ntile might be a way to chunk the data, and then do some average/weighted operation. – Simeon Pilgrim Commented Mar 17 at 22:15
1 Answer
Reset to default 0so I am not really sure what you are trying todo, but an explosive way of doing something like what you describe can be done like so:
with d0 as (
select * from values
('cheese', 0, 5, 100),
('cheese', 6, 100, 500)
t(tag, _s, _e, val)
), d1 as (
select
tag,
value::number rn,
val,
ntile(10) over (partition by tag order by rn) as tile,
from d0,
table(flatten(array_generate_range(_s, _e+1)))
)
select
tag,
tile,
avg(rn) as mid,
avg(val) as val
from d1
group by 1,2
order by 1,2;
which gives:
TAG | TILE | MID | VAL |
---|---|---|---|
cheese | 1 | 5.000000 | 281.818182 |
cheese | 2 | 15.500000 | 500.000000 |
cheese | 3 | 25.500000 | 500.000000 |
cheese | 4 | 35.500000 | 500.000000 |
cheese | 5 | 45.500000 | 500.000000 |
cheese | 6 | 55.500000 | 500.000000 |
cheese | 7 | 65.500000 | 500.000000 |
cheese | 8 | 75.500000 | 500.000000 |
cheese | 9 | 85.500000 | 500.000000 |
cheese | 10 | 95.500000 | 500.000000 |
those rows do not really need expanding, and the interpolation can be driven against a d0
like table, if that is how your data is sourced..