最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Pad timestamps in QuestDB to spread the entire range - Stack Overflow

programmeradmin3浏览0评论

When requesting logs from Jan 1 to Feb 1, but there is data only from Jan 10 and Jan 20, I still want to get empty data points spanning the whole range. Using FILL I managed to create blank data points between the existing timestamps, but I still need to fully pad the data. Is it possible purely with QuestDB, or do I need to manually add timestamps after?

SELECT event_type,
        COUNT(*) AS event_count,
        timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FILL (0)
ORDER BY interval_start, event_type;

When requesting logs from Jan 1 to Feb 1, but there is data only from Jan 10 and Jan 20, I still want to get empty data points spanning the whole range. Using FILL I managed to create blank data points between the existing timestamps, but I still need to fully pad the data. Is it possible purely with QuestDB, or do I need to manually add timestamps after?

SELECT event_type,
        COUNT(*) AS event_count,
        timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FILL (0)
ORDER BY interval_start, event_type;
Share Improve this question asked Feb 14 at 6:51 Nikolay DyankovNikolay Dyankov 7,23411 gold badges65 silver badges89 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

In recents versions of QuestDB we introduced the FROM/TO modifier to FILL https://questdb/docs/reference/sql/sample-by/#from-to

Unfortunately, that only works for unkeyed queries, meaning you could group only by timestamp and any aggregations. A potential way of solving your query would be doing a UNION with a FILL FROM/TO query per each event_type, as in

SELECT    
    'click' AS event_type, 
    COUNT(*) AS event_count,
    timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FROM '2024-01-01' TO '2024-02-01' FILL (0)
UNION ALL 
SELECT    
    'view' AS event_type, 
    COUNT(*) AS event_count,
    timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FROM '2024-01-01' TO '2024-02-01' FILL (0) 
ORDER BY interval_start, event_type;
;

An alternative is using the old trick of UNIONING the boundaries before and after your query and using FILL directly, as in

with t AS 
( (   
    select null as event_type, to_timestamp('2024-01-01', 'yyyy-MM-dd') as timestamp
UNION ALL
SELECT 
    event_type,    
    timestamp 
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
UNION ALL 
    select null as event_type, to_timestamp('2024-02-01', 'yyyy-MM-dd') as timestamp

) timestamp(timestamp)
), sampled AS (
SELECT  event_type,
    COUNT(*) AS event_count,
    timestamp AS interval_start
FROM t    
SAMPLE BY 1h  FILL (0)
)
SELECT * from sampled where event_type is not null  
ORDER BY interval_start, event_type;

This second option will probably be faster than the first one, but leaving both so you can choose whichever fits better.

发布评论

评论列表(0)

  1. 暂无评论