最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - How to match these 2 tables? - Stack Overflow

programmeradmin3浏览0评论

Here is the first table A in Redshift:

team     origin_id     targte_id
a        1             11
b        2             22
b        NULL          33
c        5             55
c        NULL          66

The origin_id and target_id could be NULL, but they couldn't both NULL in one row.

And the second table B:

team     origin_id     targte_id     content
a        1             11            aaa
a        1             11            bbb
b        NULL          22            xxx
c        5             NULL          zzz

I want to join these 2 tables and match the value of content to the table A. The match rule is: A.team_id=B.team_id; if the origin_id is not null, then A.origin_id=B.origin_id; if the origin_id is null, then A.target_id=B.target_id. And the row count of the final result should be the same as the table A.

The final result set I wanted is:

team     origin_id     targte_id     content
a        1             11            aaa
b        2             22            xxx
b        NULL          33            NULL
c        5             55            zzz
c        NULL          66            NULL

(For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.)

What's the correct SQL to achieve this in Redshift? I try left join but the result looks like a Cartesian Product of these 2 tables.

Here is the first table A in Redshift:

team     origin_id     targte_id
a        1             11
b        2             22
b        NULL          33
c        5             55
c        NULL          66

The origin_id and target_id could be NULL, but they couldn't both NULL in one row.

And the second table B:

team     origin_id     targte_id     content
a        1             11            aaa
a        1             11            bbb
b        NULL          22            xxx
c        5             NULL          zzz

I want to join these 2 tables and match the value of content to the table A. The match rule is: A.team_id=B.team_id; if the origin_id is not null, then A.origin_id=B.origin_id; if the origin_id is null, then A.target_id=B.target_id. And the row count of the final result should be the same as the table A.

The final result set I wanted is:

team     origin_id     targte_id     content
a        1             11            aaa
b        2             22            xxx
b        NULL          33            NULL
c        5             55            zzz
c        NULL          66            NULL

(For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.)

What's the correct SQL to achieve this in Redshift? I try left join but the result looks like a Cartesian Product of these 2 tables.

Share Improve this question edited Jan 20 at 8:56 John Rotenstein 269k28 gold badges443 silver badges530 bronze badges Recognized by AWS Collective asked Jan 20 at 6:18 RinzeRinze 8621 gold badge10 silver badges25 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 2

Left join is correct. However, in your example where

For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.

There is no way to "just choose any one", a JOIN operation will match both rows. It is up to you to de-dupe the result set afterwards.

For example, you can group by A.team_id, A.origin_id, A.target_id:

SELECT
  team_id,
  origin_id,
  target_id,
  MIN(content) any_content
FROM (
  SELECT
    A.team_id,
    A.origin_id,
    A.target_id,
    B.content
  FROM
    A LEFT JOIN B
  ON
    A.team_id = B.team_id AND
    COALESCE(A.origin_id, 'is_null') = COALESCE(B.origin_id, 'is_null') AND
    COALESCE(A.target_id, 'is_null') = COALESCE(B.target_id, 'is_null')
)
GROUP BY
  team_id,
  origin_id,
  target_id

The query joins Table A and Table B using a LEFT JOIN with conditional matching: origin_id is matched if not NULL, otherwise targte_id is matched. The ROW_NUMBER function ensures only one match is selected per row in Table A when multiple matches exist. If no match is found, the LEFT JOIN ensures NULL is retained for content. This guarantees the final result has the same number of rows as Table A, with a single matching content (or NULL) for each row.

below is a simple code you can try

    WITH RankedMatches AS (
    SELECT 
        A.team,
        A.origin_id,
        A.targte_id,
        B.content,
        ROW_NUMBER() OVER (
            PARTITION BY A.team, A.origin_id, A.targte_id
            ORDER BY B.content -- Arbitrary order; adjust as needed
        ) AS rn
    FROM 
        TableA A
    LEFT JOIN 
        TableB B
    ON 
        A.team = B.team
        AND (
            (A.origin_id IS NOT NULL AND A.origin_id = B.origin_id) OR
            (A.origin_id IS NULL AND A.targte_id = B.targte_id)
        )
)
SELECT
    team,
    origin_id,
    targte_id,
    CASE WHEN rn = 1 THEN content ELSE NULL END AS content
FROM 
    RankedMatches
WHERE 
    rn = 1 -- Ensures only one row per match
ORDER BY 
    team, origin_id, targte_id;
发布评论

评论列表(0)

  1. 暂无评论