最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

database - Query base table and derived materialized views without overlap - Stack Overflow

programmeradmin6浏览0评论

I have a table recording trading data at a rate of several events per second (in the low hundreds per second). Over time I don't need that fine-grained data, so I have defined two materialized views and added TTL to the main table, to save storage and get faster queries over historical data. This is my schema:



CREATE TABLE 'trades' ( 
    symbol SYMBOL CAPACITY 256 CACHE,
    side SYMBOL CAPACITY 256 CACHE,
    price DOUBLE,
    amount DOUBLE,
    timestamp TIMESTAMP
) timestamp(timestamp) 
PARTITION BY DAY TTL 7d WAL;

CREATE MATERIALIZED VIEW IF NOT EXISTS OHLC_15m AS (  
 SELECT
    timestamp, symbol, 
      first(price) AS open, 
            last(price) AS close, 
            min(price) as low,
      max(price) as high, 
            sum(amount) AS volume
 FROM trades
 SAMPLE BY 15m
 ) PARTITION BY WEEK TTL 6M;

 CREATE MATERIALIZED VIEW IF NOT EXISTS OHLC_1h AS (  
 SELECT
    timestamp, symbol, 
      first(open) AS open, 
            last(close) AS close, 
            min(low) as low,
      max(high) as high, 
            sum(volume) AS volume
 FROM OHLC_15m
 SAMPLE BY 1h
 ) PARTITION BY MONTH;

The raw data is stored for 7 days only. Then I have a materialized view that stores data at 15 minutes sampling for 6 months. And I have another that stores data sampled at 1 hour intervals forever.

All good. But now I want to query all the data since a year ago, downsampling at 5 minutes intervals on the base table, and then at the downsampled resolution for the other two. I have this query

SELECT
    timestamp, symbol, 
      first(price) AS open, 
            last(price) AS close, 
            min(price) as low,
      max(price) as high, 
            sum(amount) AS volume
 FROM trades 
 SAMPLE BY 5m
UNION
SELECT * FROM OHLC_15m
UNION
SELECT * FROM OHLC_1h

My problem is that since there is overlap across the materialized and the views, I am getting the data more recent than 1 week thrice, and the one more recent than 6 months twice. I could probably add some date calculations, but since the TTL is asynchronous, it could be error prone. Any better ideas?

I have a table recording trading data at a rate of several events per second (in the low hundreds per second). Over time I don't need that fine-grained data, so I have defined two materialized views and added TTL to the main table, to save storage and get faster queries over historical data. This is my schema:



CREATE TABLE 'trades' ( 
    symbol SYMBOL CAPACITY 256 CACHE,
    side SYMBOL CAPACITY 256 CACHE,
    price DOUBLE,
    amount DOUBLE,
    timestamp TIMESTAMP
) timestamp(timestamp) 
PARTITION BY DAY TTL 7d WAL;

CREATE MATERIALIZED VIEW IF NOT EXISTS OHLC_15m AS (  
 SELECT
    timestamp, symbol, 
      first(price) AS open, 
            last(price) AS close, 
            min(price) as low,
      max(price) as high, 
            sum(amount) AS volume
 FROM trades
 SAMPLE BY 15m
 ) PARTITION BY WEEK TTL 6M;

 CREATE MATERIALIZED VIEW IF NOT EXISTS OHLC_1h AS (  
 SELECT
    timestamp, symbol, 
      first(open) AS open, 
            last(close) AS close, 
            min(low) as low,
      max(high) as high, 
            sum(volume) AS volume
 FROM OHLC_15m
 SAMPLE BY 1h
 ) PARTITION BY MONTH;

The raw data is stored for 7 days only. Then I have a materialized view that stores data at 15 minutes sampling for 6 months. And I have another that stores data sampled at 1 hour intervals forever.

All good. But now I want to query all the data since a year ago, downsampling at 5 minutes intervals on the base table, and then at the downsampled resolution for the other two. I have this query

SELECT
    timestamp, symbol, 
      first(price) AS open, 
            last(price) AS close, 
            min(price) as low,
      max(price) as high, 
            sum(amount) AS volume
 FROM trades 
 SAMPLE BY 5m
UNION
SELECT * FROM OHLC_15m
UNION
SELECT * FROM OHLC_1h

My problem is that since there is overlap across the materialized and the views, I am getting the data more recent than 1 week thrice, and the one more recent than 6 months twice. I could probably add some date calculations, but since the TTL is asynchronous, it could be error prone. Any better ideas?

Share Improve this question asked Mar 21 at 17:57 Javier RamirezJavier Ramirez 4,0851 gold badge27 silver badges36 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

In this case, we can add as a condition that the materialized views will show data only up to the oldest point stored by the view or table of immediate finer resolution. The last materialized view will include data only WHERE timestamp < (SELECT timestamp FROM OHLC_15m LIMIT 1) and the OHLC_15m view will include data only WHERE timestamp < (SELECT timestamp FROM trades LIMIT 1) , so no data will be accounted for more than once.


SELECT
    timestamp, symbol, 
      first(price) AS open, 
            last(price) AS close, 
            min(price) as low,
      max(price) as high, 
            sum(amount) AS volume
 FROM trades 
 SAMPLE BY 1m
UNION
SELECT * FROM OHLC_15m
WHERE timestamp < (SELECT timestamp FROM trades LIMIT 1) 
UNION
SELECT * FROM OHLC_1h
WHERE timestamp <  (SELECT timestamp FROM OHLC_15m LIMIT 1) ;

发布评论

评论列表(0)

  1. 暂无评论