When requesting logs from Jan 1 to Feb 1, but there is data only from Jan 10 and Jan 20, I still want to get empty data points spanning the whole range. Using FILL
I managed to create blank data points between the existing timestamps, but I still need to fully pad the data. Is it possible purely with QuestDB, or do I need to manually add timestamps after?
SELECT event_type,
COUNT(*) AS event_count,
timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FILL (0)
ORDER BY interval_start, event_type;
When requesting logs from Jan 1 to Feb 1, but there is data only from Jan 10 and Jan 20, I still want to get empty data points spanning the whole range. Using FILL
I managed to create blank data points between the existing timestamps, but I still need to fully pad the data. Is it possible purely with QuestDB, or do I need to manually add timestamps after?
SELECT event_type,
COUNT(*) AS event_count,
timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FILL (0)
ORDER BY interval_start, event_type;
Share
Improve this question
asked Feb 14 at 6:51
Nikolay DyankovNikolay Dyankov
7,23411 gold badges65 silver badges89 bronze badges
1 Answer
Reset to default 0In recents versions of QuestDB we introduced the FROM/TO modifier to FILL https://questdb/docs/reference/sql/sample-by/#from-to
Unfortunately, that only works for unkeyed queries, meaning you could group only by timestamp and any aggregations. A potential way of solving your query would be doing a UNION with a FILL FROM/TO query per each event_type, as in
SELECT
'click' AS event_type,
COUNT(*) AS event_count,
timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FROM '2024-01-01' TO '2024-02-01' FILL (0)
UNION ALL
SELECT
'view' AS event_type,
COUNT(*) AS event_count,
timestamp AS interval_start
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
SAMPLE BY ${sampleBy} FROM '2024-01-01' TO '2024-02-01' FILL (0)
ORDER BY interval_start, event_type;
;
An alternative is using the old trick of UNIONING the boundaries before and after your query and using FILL directly, as in
with t AS
( (
select null as event_type, to_timestamp('2024-01-01', 'yyyy-MM-dd') as timestamp
UNION ALL
SELECT
event_type,
timestamp
FROM logs
WHERE project_id = '${projectId}'
AND timestamp >= '${startDate}'
AND timestamp <= '${endDate}'
AND event_type IN (${events.map((event: string) => `'${event}'`).join(', ')})
UNION ALL
select null as event_type, to_timestamp('2024-02-01', 'yyyy-MM-dd') as timestamp
) timestamp(timestamp)
), sampled AS (
SELECT event_type,
COUNT(*) AS event_count,
timestamp AS interval_start
FROM t
SAMPLE BY 1h FILL (0)
)
SELECT * from sampled where event_type is not null
ORDER BY interval_start, event_type;
This second option will probably be faster than the first one, but leaving both so you can choose whichever fits better.