最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Find rows in Table A where there are rows in table B that match criteria 1 but of those, none of them match criteria 2 - S

programmeradmin2浏览0评论

This is a pattern question.

Table A has primaryId and secondaryId. Table B has primaryId and secondaryId

I need to find rows in table A where there are rows in table B that match primaryId, but of those matches, none of them match on secondaryId. If there are no matches on primaryId, I need to exclude those rows from my result. This is what I came up with:

CREATE TABLE #TableA (
    primaryId INT,
    secondaryId INT
)

INSERT INTO #TableA (
    primaryId,
    secondaryId
)
VALUES
-- excluded because there are no matching primary ids
(1, 2),
-- included because there are matches on primary id, but not on secondary id
(2, 1),
-- excluded because there is a match on primary and on secondary id
(2, 2)

CREATE TABLE #TableB (
    primaryId INT,
    secondaryId INT
)

INSERT INTO #TableB (
    primaryId,
    secondaryId
)
VALUES
(2, 2),
(2, 3)

SELECT
    *
FROM #TableA a
WHERE 1=1
    AND EXISTS (
        SELECT
            1
        FROM #TableB b
        WHERE 1=1
            AND a.primaryId = b.primaryId
    )
    AND NOT EXISTS (
        SELECT
            1
        FROM #TableB b
        WHERE 1=1
            AND a.primaryId = b.primaryId
            AND a.secondaryId = b.secondaryId
    )

Output

primaryId secondaryId
2         1

Is there a better way to do this? Note my actual dataset involves millions of records

This is a pattern question.

Table A has primaryId and secondaryId. Table B has primaryId and secondaryId

I need to find rows in table A where there are rows in table B that match primaryId, but of those matches, none of them match on secondaryId. If there are no matches on primaryId, I need to exclude those rows from my result. This is what I came up with:

CREATE TABLE #TableA (
    primaryId INT,
    secondaryId INT
)

INSERT INTO #TableA (
    primaryId,
    secondaryId
)
VALUES
-- excluded because there are no matching primary ids
(1, 2),
-- included because there are matches on primary id, but not on secondary id
(2, 1),
-- excluded because there is a match on primary and on secondary id
(2, 2)

CREATE TABLE #TableB (
    primaryId INT,
    secondaryId INT
)

INSERT INTO #TableB (
    primaryId,
    secondaryId
)
VALUES
(2, 2),
(2, 3)

SELECT
    *
FROM #TableA a
WHERE 1=1
    AND EXISTS (
        SELECT
            1
        FROM #TableB b
        WHERE 1=1
            AND a.primaryId = b.primaryId
    )
    AND NOT EXISTS (
        SELECT
            1
        FROM #TableB b
        WHERE 1=1
            AND a.primaryId = b.primaryId
            AND a.secondaryId = b.secondaryId
    )

Output

primaryId secondaryId
2         1

Is there a better way to do this? Note my actual dataset involves millions of records

Share Improve this question edited Mar 10 at 18:48 Trenchfox1917 asked Mar 10 at 18:13 Trenchfox1917Trenchfox1917 213 bronze badges 4
  • 3 Please share sample input and output data. minimal-reproducible-example – samhita Commented Mar 10 at 18:17
  • Have you tested? Have you run an explain? – Andrew Commented Mar 10 at 20:45
  • 2 What makes a solution 'better'? Your solution is clear to read, and, if those columns are indexed, possibly the fastest. – tinazmu Commented Mar 10 at 22:10
  • With an appropriate index in place (TableB(primaryId, secondaryId), both EXISTS() conditions could be executed as index seeks, which is about as efficient as it gets. Your code as written also reads nearly exactly as your requirements reads. That is a good thing. There might be some other trick to accomplish the same effect with an aggregated subquery, but I do not think that would match the performance, and the readability (and maintainability) would suffer. – T N Commented Mar 11 at 1:57
Add a comment  | 

2 Answers 2

Reset to default 0

I find your overall approach correct and depending on indexing it might be the fastest.

As an alternative I might do something like, maybe something more compact and with a slightly different approach, using a CASE expression.

SELECT *
FROM
(SELECT A.*, CASE WHEN a.secondaryId = B.secondaryId THEN 1 ELSE 0 END as MatchesSecondCondition
FROM TableA as A
INNER JOIN TableB as B
    ON A.primaryId = B.primaryId) tmp
WHERE MatchesSecondCondition = 0

One of the options could be joining tables twice - first inner join on primaryId to filter just common rows and second left join on secondaryId too filtered with Where secondaryId is null.
You'll need distinct or group by to get ridd of multiple rows...

Select     Distinct a.*
From       #TableA a
Inner Join #TableB b ON(b.primaryId = a.primaryId )
Left Join  #TableB b2 ON(b2.primaryId = a.primaryId And 
                         b2.secondaryId = a.secondaryId)
Where      b2.secondaryId Is Null
primaryId secondaryId
2 1

fiddle

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论