join - How to assign a common count from a specific category across multiple categories without row explosion in SQL?

Problem statement: I am working on a reporting dashboard in Tableau where each record belongs to one status_category. However, I need to display a common aggregated count (processed_count) across all categories without causing row explosion.

For example, if 1,200 transactions were processed, I want that number to appear in every category (eg - "Pending," "In Progress," etc.), even for records that didn’t have a processed status.

Approaches attempted:

Extracted distinct status categories
Extracted distinct grouping dimensions (that are used as filters in my dashboard)
CROSS JOINed them to ensure every status appears for each unique combination of the grouping columns
LEFT JOINed the result back to the main dataset

Issue: this worked correctly, but caused massive row explosion

Original dataset: 600k rows
After applying this approach: 2.4M rows

This impacts performance in our database and visualization tool by making queries inefficient and introducing unnecessary data bloat.

Sample output:

Region	Department	Segment	Cycle_Month	Status_Category	Processed_Count	Record_Count
North	Finance	Large	Jan	Processed in Current Cycle	1,200	500
North	Finance	Large	Jan	Pending	1,200	300
North	Finance	Large	Jan	In Progress	1,200	400

For example, if 1,200 transactions were processed, I want that number to appear in every category (eg - "Pending," "In Progress," etc.), even for records that didn’t have a processed status.

Approaches attempted:

Extracted distinct status categories
Extracted distinct grouping dimensions (that are used as filters in my dashboard)
CROSS JOINed them to ensure every status appears for each unique combination of the grouping columns
LEFT JOINed the result back to the main dataset

Issue: this worked correctly, but caused massive row explosion

Original dataset: 600k rows
After applying this approach: 2.4M rows

This impacts performance in our database and visualization tool by making queries inefficient and introducing unnecessary data bloat.

Sample output:

Region	Department	Segment	Cycle_Month	Status_Category	Processed_Count	Record_Count
North	Finance	Large	Jan	Processed in Current Cycle	1,200	500
North	Finance	Large	Jan	Pending	1,200	300
North	Finance	Large	Jan	In Progress	1,200	400

Screenshot

Query:

-- Aggregate Data by Dimensions
WITH aggregated_data AS 
(
    SELECT
        pd.cycle_month,
        pd.region_id,
        pd.region_name,
        pd.department,
        pd.segment,
        pd.team_id,
        pd.team_name,
        pd.division_id,
        pd.division_name,
        pd.manager_id,
        pd.manager_name,
        pd.status_category,
        COUNT(DISTINCT pd.record_id) AS record_count
    FROM processed_data_table pd
    GROUP BY ALL
),

-- Extract Unique Status Categories
distinct_statuses AS (
    SELECT DISTINCT status_category
    FROM aggregated_data
),

-- Extract Unique Dimension Combinations (excluding status_category)
distinct_dimensions AS (
    SELECT DISTINCT
        cycle_month,
        region_id,
        region_name,
        department,
        segment,
        team_id,
        team_name,
        division_id,
        division_name,
        manager_id,
        manager_name
    FROM aggregated_data
),

-- Create a Full Status-Dimension Combination
status_dimension_mapping AS (
    SELECT
        d.*,  
        s.status_category
    FROM distinct_dimensions d
    CROSS JOIN distinct_statuses s
),

-- Merge Actual Record Counts While Retaining All Statuses
data_with_defaults AS (
    SELECT
        sdm.*,
        COALESCE(ad.record_count, 0) AS record_count
    FROM status_dimension_mapping sdm
    LEFT JOIN aggregated_data ad
        ON sdm.cycle_month = ad.cycle_month
        AND sdm.region_id = ad.region_id
        AND sdm.department = ad.department
        AND sdm.segment = ad.segment
        AND sdm.team_id = ad.team_id
        AND sdm.division_id = ad.division_id
        AND sdm.manager_id = ad.manager_id
        AND sdm.status_category = ad.status_category
),

-- Assign Processed Count Across All Statuses
output AS (
    SELECT
        *,
        MAX(CASE WHEN status_category = 'Processed in Current Cycle' THEN record_count END)
            OVER (PARTITION BY cycle_month, region_id, department, segment,
                             team_id, division_id, manager_id
                             )
            AS processed_count
    FROM data_with_defaults
)
SELECT * FROM output;

Share Improve this question edited Mar 15 at 8:29 marc_s 756k184 gold badges1.4k silver badges1.5k bronze badges asked Mar 15 at 0:22 blue thunder 32 bronze badges

Please share sample input, expected sample output.minimal-reproducible-example – samhita Commented Mar 15 at 9:46

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

This is based on assumption due to limited information about sample input data and columns.

Based on my understanding you need a view showing count of each status for a group of dimension,alongside count of processed status for the same group of dimensions.

If the understanding correct, you can just find count of each status category and find count of processed category like below .

SELECT
    pd.cycle_month,
    pd.region_id,
    pd.region_name,
    pd.department,
    pd.segment,
    pd.team_id,
    pd.team_name,
    pd.division_id,
    pd.division_name,
    pd.manager_id,
    pd.manager_name,
    pd.status_category,
    COUNT(DISTINCT pd.record_id) AS status_count,
    MAX(CASE WHEN pd.status_category = 'Processed' THEN COUNT(DISTINCT pd.record_id) END)
        OVER (PARTITION BY pd.cycle_month, pd.region_id, pd.department, pd.segment, 
                             pd.team_id, pd.division_id, pd.manager_id) AS processed_count
FROM
    processed_data_table pd
GROUP BY
    pd.cycle_month, pd.region_id, pd.region_name, pd.department, pd.segment,
    pd.team_id, pd.team_name, pd.division_id, pd.division_name,
    pd.manager_id, pd.manager_name, pd.status_category;

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

join - How to assign a common count from a specific category across multiple categories without row explosion in SQL? - Stack Ov

1 Answer 1

与本文相关的文章

评论列表(0)