I have a table recording trading data at a rate of several events per second (in the low hundreds per second). Over time I don't need that fine-grained data, so I have defined two materialized views and added TTL to the main table, to save storage and get faster queries over historical data. This is my schema:
CREATE TABLE 'trades' (
symbol SYMBOL CAPACITY 256 CACHE,
side SYMBOL CAPACITY 256 CACHE,
price DOUBLE,
amount DOUBLE,
timestamp TIMESTAMP
) timestamp(timestamp)
PARTITION BY DAY TTL 7d WAL;
CREATE MATERIALIZED VIEW IF NOT EXISTS OHLC_15m AS (
SELECT
timestamp, symbol,
first(price) AS open,
last(price) AS close,
min(price) as low,
max(price) as high,
sum(amount) AS volume
FROM trades
SAMPLE BY 15m
) PARTITION BY WEEK TTL 6M;
CREATE MATERIALIZED VIEW IF NOT EXISTS OHLC_1h AS (
SELECT
timestamp, symbol,
first(open) AS open,
last(close) AS close,
min(low) as low,
max(high) as high,
sum(volume) AS volume
FROM OHLC_15m
SAMPLE BY 1h
) PARTITION BY MONTH;
The raw data is stored for 7 days only. Then I have a materialized view that stores data at 15 minutes sampling for 6 months. And I have another that stores data sampled at 1 hour intervals forever.
All good. But now I want to query all the data since a year ago, downsampling at 5 minutes intervals on the base table, and then at the downsampled resolution for the other two. I have this query
SELECT
timestamp, symbol,
first(price) AS open,
last(price) AS close,
min(price) as low,
max(price) as high,
sum(amount) AS volume
FROM trades
SAMPLE BY 5m
UNION
SELECT * FROM OHLC_15m
UNION
SELECT * FROM OHLC_1h
My problem is that since there is overlap across the materialized and the views, I am getting the data more recent than 1 week thrice, and the one more recent than 6 months twice. I could probably add some date calculations, but since the TTL is asynchronous, it could be error prone. Any better ideas?
I have a table recording trading data at a rate of several events per second (in the low hundreds per second). Over time I don't need that fine-grained data, so I have defined two materialized views and added TTL to the main table, to save storage and get faster queries over historical data. This is my schema:
CREATE TABLE 'trades' (
symbol SYMBOL CAPACITY 256 CACHE,
side SYMBOL CAPACITY 256 CACHE,
price DOUBLE,
amount DOUBLE,
timestamp TIMESTAMP
) timestamp(timestamp)
PARTITION BY DAY TTL 7d WAL;
CREATE MATERIALIZED VIEW IF NOT EXISTS OHLC_15m AS (
SELECT
timestamp, symbol,
first(price) AS open,
last(price) AS close,
min(price) as low,
max(price) as high,
sum(amount) AS volume
FROM trades
SAMPLE BY 15m
) PARTITION BY WEEK TTL 6M;
CREATE MATERIALIZED VIEW IF NOT EXISTS OHLC_1h AS (
SELECT
timestamp, symbol,
first(open) AS open,
last(close) AS close,
min(low) as low,
max(high) as high,
sum(volume) AS volume
FROM OHLC_15m
SAMPLE BY 1h
) PARTITION BY MONTH;
The raw data is stored for 7 days only. Then I have a materialized view that stores data at 15 minutes sampling for 6 months. And I have another that stores data sampled at 1 hour intervals forever.
All good. But now I want to query all the data since a year ago, downsampling at 5 minutes intervals on the base table, and then at the downsampled resolution for the other two. I have this query
SELECT
timestamp, symbol,
first(price) AS open,
last(price) AS close,
min(price) as low,
max(price) as high,
sum(amount) AS volume
FROM trades
SAMPLE BY 5m
UNION
SELECT * FROM OHLC_15m
UNION
SELECT * FROM OHLC_1h
My problem is that since there is overlap across the materialized and the views, I am getting the data more recent than 1 week thrice, and the one more recent than 6 months twice. I could probably add some date calculations, but since the TTL is asynchronous, it could be error prone. Any better ideas?
Share Improve this question asked Mar 21 at 17:57 Javier RamirezJavier Ramirez 4,0851 gold badge27 silver badges36 bronze badges1 Answer
Reset to default 0In this case, we can add as a condition that the materialized views will show data only up to the oldest point stored by the view or table of immediate finer resolution. The last materialized view will include data only WHERE timestamp < (SELECT timestamp FROM OHLC_15m LIMIT 1)
and the OHLC_15m view will include data only WHERE timestamp < (SELECT timestamp FROM trades LIMIT 1)
, so no data will be accounted for more than once.
SELECT
timestamp, symbol,
first(price) AS open,
last(price) AS close,
min(price) as low,
max(price) as high,
sum(amount) AS volume
FROM trades
SAMPLE BY 1m
UNION
SELECT * FROM OHLC_15m
WHERE timestamp < (SELECT timestamp FROM trades LIMIT 1)
UNION
SELECT * FROM OHLC_1h
WHERE timestamp < (SELECT timestamp FROM OHLC_15m LIMIT 1) ;