Pad timestamps in QuestDB to spread the entire range

When requesting logs from Jan 1 to Feb 1, but there is data only from Jan 10 and Jan 20, I still want to get empty data points spanning the whole range. Using FILL I managed to create blank data points between the existing timestamps, but I still need to fully pad the data. Is it possible purely with QuestDB, or do I need to manually add timestamps after?

SELECT event_type,
        COUNT(*) AS event_count,
        timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FILL (0)
ORDER BY interval_start, event_type;

SELECT event_type,
        COUNT(*) AS event_count,
        timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FILL (0)
ORDER BY interval_start, event_type;

Share Improve this question asked Feb 14 at 6:51 Nikolay Dyankov 7,23411 gold badges65 silver badges89 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

In recents versions of QuestDB we introduced the FROM/TO modifier to FILL https://questdb/docs/reference/sql/sample-by/#from-to

Unfortunately, that only works for unkeyed queries, meaning you could group only by timestamp and any aggregations. A potential way of solving your query would be doing a UNION with a FILL FROM/TO query per each event_type, as in

SELECT    
    'click' AS event_type, 
    COUNT(*) AS event_count,
    timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FROM '2024-01-01' TO '2024-02-01' FILL (0)
UNION ALL 
SELECT    
    'view' AS event_type, 
    COUNT(*) AS event_count,
    timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FROM '2024-01-01' TO '2024-02-01' FILL (0) 
ORDER BY interval_start, event_type;
;

An alternative is using the old trick of UNIONING the boundaries before and after your query and using FILL directly, as in

with t AS 
( (   
    select null as event_type, to_timestamp('2024-01-01', 'yyyy-MM-dd') as timestamp
UNION ALL
SELECT 
    event_type,    
    timestamp 
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
UNION ALL 
    select null as event_type, to_timestamp('2024-02-01', 'yyyy-MM-dd') as timestamp

) timestamp(timestamp)
), sampled AS (
SELECT  event_type,
    COUNT(*) AS event_count,
    timestamp AS interval_start
FROM t    
SAMPLE BY 1h  FILL (0)
)
SELECT * from sampled where event_type is not null  
ORDER BY interval_start, event_type;

This second option will probably be faster than the first one, but leaving both so you can choose whichever fits better.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Pad timestamps in QuestDB to spread the entire range - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)