oracle database - Sum columns by ID in SQL

I am having trouble wrapping my head around the following problem. A sample dataset (truncated):

ID	Test	Instance	Weight	Outcome
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	B	1	0.5	2
1	B	1	0.5	2
2	A	1	1	1
2	A	2	1	2
2	B	1	1	2
3	C	1	1	2
3	C	1	1	2
3	C	1	1	2
3	C	1	1	2
3	C	1	1	2
4	A	1	0.5	3
4	B	1	0.5	4
4	B	1	0.5	4
5	A	1	0.5	2
5	B	1	0.5	2
6	A	1	0.25	2
6	A	1	0.25	2
6	A	2	0.25	2
6	A	3	0.25	1
6	B	1	0.25	2

I am having trouble wrapping my head around the following problem. A sample dataset (truncated):

ID	Test	Instance	Weight	Outcome
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	A	1	0.5	2
1	B	1	0.5	2
1	B	1	0.5	2
2	A	1	1	1
2	A	2	1	2
2	B	1	1	2
3	C	1	1	2
3	C	1	1	2
3	C	1	1	2
3	C	1	1	2
3	C	1	1	2
4	A	1	0.5	3
4	B	1	0.5	4
4	B	1	0.5	4
5	A	1	0.5	2
5	B	1	0.5	2
6	A	1	0.25	2
6	A	1	0.25	2
6	A	2	0.25	2
6	A	3	0.25	1
6	B	1	0.25	2

My code:

SELECT 
    ID,
    TEST,
    INSTANCE,
    WEIGHT * (COUNT(UNIQUE(CASE WHEN OUTCOME = '1' THEN ID END))) AS Outcome1,
    WEIGHT * (COUNT(UNIQUE(CASE WHEN OUTCOME = '2' THEN ID END))) AS Outcome2,
    WEIGHT * (COUNT(UNIQUE(CASE WHEN OUTCOME = '3' THEN ID END))) AS Outcome3,
    WEIGHT * (COUNT(UNIQUE(CASE WHEN OUTCOME = '4' THEN ID END))) AS Outcome4
FROM 
    DATA
WHERE 
    TEST IN ('A', 'B', 'C')
GROUP BY 
    ID, TEST, WEIGHT
ORDER BY 
    ID;

which results in the following output:

ID	Test	Instance	Outcome1	Outcome2	Outcome3	Outcome4
1	A	1	0	0.5	0	0
1	B	1	0	0.5	0	0
2	A	1	1	0	0	0
2	A	2	0	1	0	0
2	B	1	0	1	0	0
3	C	1	0	1	0	0
4	A	1	0	0	0.5	0
4	B	1	0	0	0	0.5
5	A	1	0	0.5	0	0
5	B	1	0	0.5	0	0
6	A	1	0	0.25	0	0
6	A	2	0	0.25	0	0
6	A	3	0.25	0	0	0
6	B	1	0	0.25	0	0
7	A	1	0.5	0	0	0
7	B	1	0.5	0	0	0
8	B	1	1	0	0	0
9	A	1	0	1	0	0
9	B	1	0.5	0	0	0
9	B	2	0.5	0	0	0
10	A	1	0	0.25	0	0
10	A	2	0	0.25	0	0
11	A	1	0	1	0	0
12	B	1	0	1	0	0
13	A	1	0.5	0	0	0
13	B	1	0	0	0	0.5
14	B	1	0	0.5	0	0
15	A	1	1	0	0	0
15	B	1	1	0	0	0
15	C	1	0	0	1	0

However, I would like to have one row for each ID that collapses everything like the following:

ID	Outcome1	Outcome2	Outcome3	Outcome4
1	0	1	0	0
2	1	2	0	0
3	0	1	0	0
4	0	0	0.5	0.5
5	0	1	0	0
6	0.25	0.75	0	0
7	1	0	0	0
8	1	0	0	0
9	1	1	0	0
10	0	0.5	0	0
11	0	1	0	0
12	0	1	0	0
13	0	0	0	0.5
14	0	0.5	0	0
15	2	0	1	0

I have tried to pull this without the "Test" variable while keeping it in the group by, but that reduces each ID to just one outcome so no ID has a number higher than 1 (e.g., ID 2 Outcome 2 = 1 when it should be 2, I assume because it was only looking for each unique outcome. so if they achieved outcome 1 on tests A and B, it still only counts outcome 1 once). I tried removing the "unique" identifier as well, but that muddied things too much as some IDs have 50 records for Test A and I only care about each outcome once. (e.g., if they got Outcome 1 25 times and Outcome 2 25 times on Test A, I still want it to display Outcome 1 = 1 x weight, Outcome 2 = 1 x weight). I tried to do this in Excel using pivot tables but unfortunately my dataset is actually around 1 million rows so the export didn't go well. I would love to find a way to do within SQL.

To summarize: I would like to collapse my table so that each ID has one row with all data summarized while accounting for unique test outcomes.

Thank you!

Share Improve this question edited Mar 19 at 15:09 asked Mar 18 at 21:19 Bonnie Brett 133 bronze badges

I don't think your expected results table reflect what you're saying in your text. Which one is right? In cases such as this, you build your query piece by piece. The first piece could be a select of the unique row numbers. You could have a part that is a select of unique tests results for each unique row number. – S.Comeau Commented Mar 18 at 21:38
2 Please tag the RDBMS you are using (Toad is a UI and as such unlikely to be relevant to the question). – Dale K Commented Mar 19 at 3:39
2 A minimal reproducible example is supposed to specify both sample table data and the expected result. – jarlh Commented Mar 19 at 9:54
Thank you S.Comeau - you are correct that my expected results didn't match but it has been fixed. If my question was not clear, I am trying to get the second table instead of the first without losing any information. – Bonnie Brett Commented Mar 19 at 13:39
I included sample data and another column that I think is necessary for showing why I am using UNIQUE. Thank you for the reminder. – Bonnie Brett Commented Mar 19 at 15:10

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Any modern RDBMS (>= 2010?) will provide you Common Table Expressions that will help you inject the results of your first query in a second query dedicated to the grouping:

with t as
(

-- This is your query, unmodified except for the last ; removed:
SELECT 
    ID,
    TEST,
    WEIGHT * (COUNT(UNIQUE(CASE WHEN OUTCOME = '1' THEN ID END))) AS Outcome1,
    WEIGHT * (COUNT(UNIQUE(CASE WHEN OUTCOME = '2' THEN ID END))) AS Outcome2,
    WEIGHT * (COUNT(UNIQUE(CASE WHEN OUTCOME = '3' THEN ID END))) AS Outcome3,
    WEIGHT * (COUNT(UNIQUE(CASE WHEN OUTCOME = '4' THEN ID END))) AS Outcome4
FROM 
    DATA
WHERE 
    TEST IN ('A', 'B', 'C')
GROUP BY 
    ID, TEST, WEIGHT
ORDER BY 
    ID

)
select
    ID,
    sum(Outcome1) Outcome1,
    sum(Outcome2) Outcome2,
    sum(Outcome3) Outcome3,
    sum(Outcome4) Outcome4
from t
group by ID
order by ID;

This will help you build your solution incrementally:
you can comment the last select and place a select * from t to verify the results of the intermediate query, before unleashing the definitive version of the final one,
thus secure every "good" query into its own alias (even isolating interesting attempts in subqueries that you don't want to use in the final result, but keep for posterity),
and chain the steps to get your final report.

In essence, they are like temporary tables, that vanish outside of the WITH.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

oracle database - Sum columns by ID in SQL - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)