so my problem is that I'm creating a table with few joins, few dimensions, few metrics. I'm using only one date column in the final select, that is the mm.date below. It's in a table where my main metric resides (mm.measure). Although, in the servers table, there is another create_date column I'm not using, that tells me when a server_component was created. Now you can see, that the server_component '2' only shows up for the 1/4/2024 mm.date in the result, but it should show up for each row starting 1/1/2024. Basically, it should generate more rows in the date column based off of create_date from servers table, that I'm not adding into the select.
select
s.server_location
, s.server_component
, mm.date
, max(mm.measure)
, count(distinct s.server_component)
from
servers s
left join main_metric_table mm
on s.server_component = mm.server_component
group by
s.server_location
, s.server_component
, mm.date
;
Current result example:
Desired result example:
So the goal is not to generate multiple rows from min(mm.date), rather to generate more rows based off a different date column (create_date as a start date, per each server_component). These new rows should have max(measure) as null (not to inflate the numbers), but should have a count() column filled.
I was thinking of some sort of calendar dummy but the tables are big, and I'm not sure it's going to work because of that.
so my problem is that I'm creating a table with few joins, few dimensions, few metrics. I'm using only one date column in the final select, that is the mm.date below. It's in a table where my main metric resides (mm.measure). Although, in the servers table, there is another create_date column I'm not using, that tells me when a server_component was created. Now you can see, that the server_component '2' only shows up for the 1/4/2024 mm.date in the result, but it should show up for each row starting 1/1/2024. Basically, it should generate more rows in the date column based off of create_date from servers table, that I'm not adding into the select.
select
s.server_location
, s.server_component
, mm.date
, max(mm.measure)
, count(distinct s.server_component)
from
servers s
left join main_metric_table mm
on s.server_component = mm.server_component
group by
s.server_location
, s.server_component
, mm.date
;
Current result example:
Desired result example:
So the goal is not to generate multiple rows from min(mm.date), rather to generate more rows based off a different date column (create_date as a start date, per each server_component). These new rows should have max(measure) as null (not to inflate the numbers), but should have a count() column filled.
I was thinking of some sort of calendar dummy but the tables are big, and I'm not sure it's going to work because of that.
Share Improve this question edited Nov 20, 2024 at 10:43 Alex Poole 192k11 gold badges194 silver badges338 bronze badges asked Nov 20, 2024 at 9:59 Piotr KonopnickiPiotr Konopnicki 757 bronze badges 1- 1 Please edit your question to include a minimal reproducible example - you haven't shown the data in the two base tables so we can't see, so it's not clear whether your current result is actually wrong (or why). And please show tabes, data and results as formatted text, not images. You said "it should generate more rows in the date column based off of create_date from servers table" - but you haven't said why, or how that non-referenced column is supposed to be relevant? – Alex Poole Commented Nov 20, 2024 at 10:43
1 Answer
Reset to default 2You appear to want to generate a calendar and CROSS JOIN
it to the servers
table and then aggregate:
WITH calendar (day) AS (
SELECT DATE '2024-01-01' + LEVEL - 1 AS dt
FROM DUAL
CONNECT BY LEVEL <= 4
)
select s.server_location
, s.server_component
, c.day
, max(mm.measure)
, COUNT(distinct s.server_component) OVER (PARTITION BY s.server_location)
AS server_component_count
from calendar c
CROSS JOIN servers s
LEFT OUTER JOIN main_metric_table mm
ON ( s.server_component = mm.server_component
AND c.day = mm.dt )
group by
s.server_location
, s.server_component
, c.day;
Note: server_component
is included in the GROUP BY
clause so COUNT(DISTINCT server_component)
can be at most 1
in each group (since if there are different values they will be in different groups). To get a count across multiple groups you need to use an analytic function.
Which, for the sample data:
CREATE TABLE servers (server_location, server_component) AS
SELECT 'EU', 1 FROM DUAL UNION ALL
SELECT 'EU', 2 FROM DUAL;
CREATE TABLE main_metric_table (dt, measure, server_component) AS
SELECT DATE '2024-01-01', 0.0173, 1 FROM DUAL UNION ALL
SELECT DATE '2024-01-02', 0.5783, 1 FROM DUAL UNION ALL
SELECT DATE '2024-01-03', 0.1200, 1 FROM DUAL UNION ALL
SELECT DATE '2024-01-04', 0.3734, 1 FROM DUAL UNION ALL
SELECT DATE '2024-01-04', 0.9194, 2 FROM DUAL;
Outputs:
SERVER_LOCATION | SERVER_COMPONENT | DAY | MAX(MM.MEASURE) | SERVER_COMPONENT_COUNT |
---|---|---|---|---|
EU | 1 | 2024-01-01 00:00:00 | .0173 | 2 |
EU | 1 | 2024-01-02 00:00:00 | .5783 | 2 |
EU | 1 | 2024-01-03 00:00:00 | .12 | 2 |
EU | 1 | 2024-01-04 00:00:00 | .3734 | 2 |
EU | 2 | 2024-01-01 00:00:00 | null | 2 |
EU | 2 | 2024-01-02 00:00:00 | null | 2 |
EU | 2 | 2024-01-03 00:00:00 | null | 2 |
EU | 2 | 2024-01-04 00:00:00 | .9194 | 2 |
fiddle