sql - Difference in query and error even though distinct values

It is Bigquery and below are the sample tables and contents

dde-demo-d001.sap_crm.transactions_bkup

case_guid	transaction_header_guid
005056935CD81EEF92DF522476D53CAB	00505693-5CD8-1EEF-92DF-8D9FF1D25CAB
005056935CD81EEF92DF522476D53CAB	00505693-801A-1EEF-92DF-6A7A34E0FCAB
005056935CD81EEF92DF522476D53CAB	00505693-5CD8-1EEF-92DF-4D67C93F5CAB
005056935CD81EEF92DF522476D53CAB	00505693-5CD8-1EEF-92DF-6665227E9CAB
005056935CD81EEF92DF522476D53CAB	00505693-5CD8-1EEF-92DF-6DA0046D7CAB

It is Bigquery and below are the sample tables and contents

dde-demo-d001.sap_crm.transactions_bkup

case_guid	transaction_header_guid
005056935CD81EEF92DF522476D53CAB	00505693-5CD8-1EEF-92DF-8D9FF1D25CAB
005056935CD81EEF92DF522476D53CAB	00505693-801A-1EEF-92DF-6A7A34E0FCAB
005056935CD81EEF92DF522476D53CAB	00505693-5CD8-1EEF-92DF-4D67C93F5CAB
005056935CD81EEF92DF522476D53CAB	00505693-5CD8-1EEF-92DF-6665227E9CAB
005056935CD81EEF92DF522476D53CAB	00505693-5CD8-1EEF-92DF-6DA0046D7CAB

dde-demo-d001.sap_crm.documents_bkup

transaction_header_guid	case_guid
00505693-5CD8-1EEF-92DF-6665227E9CAB	null
00505693-5CD8-1EEF-92DF-8D9FF1D25CAB	null
00505693-5CD8-1EEF-92DF-6DA0046D7CAB	null
00505693-801A-1EEF-92DF-6A7A34E0FCAB	null
00505693-5CD8-1EEF-92DF-4D67C93F5CAB	null

query with inner join

UPDATE `dde-demo-d001.sap_crm.documents_bkup`
    SET case_guid = ICT.case_guid
    FROM `dde-demo-d001.sap_crm.documents_bkup` DFR
    INNER JOIN `dde-demo-d001.sap_crm.transactions_bkup` ICT
    ON DFR.transaction_header_guid = ICT.transaction_header_guid
    WHERE DFR.case_guid IS NULL;

query with subquery

UPDATE `dde-demo-d001.sap_crm.documents_bkup` DFR
    SET case_guid = ICT.case_guid
    FROM (SELECT transaction_header_guid,case_guid FROM `dde-demo-d001.sap_crm.transactions_bkup`) ICT
    WHERE (DFR.case_guid IS NULL) and (DFR.transaction_header_guid = ICT.transaction_header_guid);

query with inner join fails

UPDATE/MERGE must match at most one source row for each target row

query with subquery passed updating 5 rows.

I don't understand why inner join fails as the column on which the join is happening has distinct values?

Also the subquery also same logic and how it passes?

Share Improve this question edited Feb 8 at 3:40 Dale K 27.2k15 gold badges56 silver badges82 bronze badges asked Feb 7 at 11:16 Jaango Jayaraj 1451 gold badge1 silver badge10 bronze badges

with the sample data you provided, both inner join and subquery works fine.In the transactions table you can check whether duplicates exist for transaction_header_guid by using something like select transaction_header_guid,count(*) from transactions_bkup group by transaction_header_guid having count(*) >1 – samhita Commented Feb 7 at 14:59
That is so strange. I have the schema as string with mode nullable in both the fields and in both the tables. There are no duplicates which is what confusing me. I ran the query – Jaango Jayaraj Commented Feb 7 at 16:22
@samhita please check my answer and check once. I updated with the diagrams on what is going on. there are two joins happening with the inner join query – Jaango Jayaraj Commented yesterday

Add a comment |

2 Answers 2

Sorted by: Reset to default 0

The reason for the problem UPDATE/MERGE must match at most one source row for each target row is that your INNER JOIN in the first query tries to update rows in documents_bkup when there are several matching rows in transactions_bkup depending on transaction_header_guid. Because an INNER JOIN returns all matching rows from both tables, the update statement becomes unclear which case_guid to use if a transaction_header_guid from documents_bkup appears more than once in transactions_bkup with various case_guid values. Due this ambiguity, bigquery throws an error. Refer to this documentation for more information.

Using aggregate functions in the JOIN clause to choose a single case_guid for each transaction_header_guid may be necessary to use the INNER JOIN successfully (e.g., MIN(case_guid), MAX(case_guid)) And also refer this stack link1, link2.

Using the inner join there are two joins. During the second join we don't specify a column to join on. It combines all rows for each row. Behind the scenes update is a join.

So specify the column clause for the second join:

UPDATE `dde-demo-d001.sap_crm.documents_bkup_copy` trg
    SET case_guid = ICT.case_guid
    FROM `dde-demo-d001.sap_crm.documents_bkup_copy` DFR
    INNER JOIN `dde-demo-d001.sap_crm.transactions_bkup` ICT
    ON DFR.transaction_header_guid = ICT.transaction_header_guid
    WHERE DFR.case_guid IS NULL
    AND trg.transaction_header_guid=ICT.transaction_header_guid;

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

sql - Difference in query and error even though distinct values - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)