This is a pattern question.
Table A has primaryId and secondaryId. Table B has primaryId and secondaryId
I need to find rows in table A where there are rows in table B that match primaryId, but of those matches, none of them match on secondaryId. If there are no matches on primaryId, I need to exclude those rows from my result. This is what I came up with:
CREATE TABLE #TableA (
primaryId INT,
secondaryId INT
)
INSERT INTO #TableA (
primaryId,
secondaryId
)
VALUES
-- excluded because there are no matching primary ids
(1, 2),
-- included because there are matches on primary id, but not on secondary id
(2, 1),
-- excluded because there is a match on primary and on secondary id
(2, 2)
CREATE TABLE #TableB (
primaryId INT,
secondaryId INT
)
INSERT INTO #TableB (
primaryId,
secondaryId
)
VALUES
(2, 2),
(2, 3)
SELECT
*
FROM #TableA a
WHERE 1=1
AND EXISTS (
SELECT
1
FROM #TableB b
WHERE 1=1
AND a.primaryId = b.primaryId
)
AND NOT EXISTS (
SELECT
1
FROM #TableB b
WHERE 1=1
AND a.primaryId = b.primaryId
AND a.secondaryId = b.secondaryId
)
Output
primaryId secondaryId
2 1
Is there a better way to do this? Note my actual dataset involves millions of records
This is a pattern question.
Table A has primaryId and secondaryId. Table B has primaryId and secondaryId
I need to find rows in table A where there are rows in table B that match primaryId, but of those matches, none of them match on secondaryId. If there are no matches on primaryId, I need to exclude those rows from my result. This is what I came up with:
CREATE TABLE #TableA (
primaryId INT,
secondaryId INT
)
INSERT INTO #TableA (
primaryId,
secondaryId
)
VALUES
-- excluded because there are no matching primary ids
(1, 2),
-- included because there are matches on primary id, but not on secondary id
(2, 1),
-- excluded because there is a match on primary and on secondary id
(2, 2)
CREATE TABLE #TableB (
primaryId INT,
secondaryId INT
)
INSERT INTO #TableB (
primaryId,
secondaryId
)
VALUES
(2, 2),
(2, 3)
SELECT
*
FROM #TableA a
WHERE 1=1
AND EXISTS (
SELECT
1
FROM #TableB b
WHERE 1=1
AND a.primaryId = b.primaryId
)
AND NOT EXISTS (
SELECT
1
FROM #TableB b
WHERE 1=1
AND a.primaryId = b.primaryId
AND a.secondaryId = b.secondaryId
)
Output
primaryId secondaryId
2 1
Is there a better way to do this? Note my actual dataset involves millions of records
Share Improve this question edited Mar 10 at 18:48 Trenchfox1917 asked Mar 10 at 18:13 Trenchfox1917Trenchfox1917 213 bronze badges 4 |2 Answers
Reset to default 0I find your overall approach correct and depending on indexing it might be the fastest.
As an alternative I might do something like, maybe something more compact and with a slightly different approach, using a CASE expression.
SELECT *
FROM
(SELECT A.*, CASE WHEN a.secondaryId = B.secondaryId THEN 1 ELSE 0 END as MatchesSecondCondition
FROM TableA as A
INNER JOIN TableB as B
ON A.primaryId = B.primaryId) tmp
WHERE MatchesSecondCondition = 0
One of the options could be joining tables twice - first inner join on primaryId to filter just common rows and second left join on secondaryId too filtered with Where secondaryId is null.
You'll need distinct or group by to get ridd of multiple rows...
Select Distinct a.*
From #TableA a
Inner Join #TableB b ON(b.primaryId = a.primaryId )
Left Join #TableB b2 ON(b2.primaryId = a.primaryId And
b2.secondaryId = a.secondaryId)
Where b2.secondaryId Is Null
primaryId | secondaryId |
---|---|
2 | 1 |
fiddle
TableB(primaryId, secondaryId
), bothEXISTS()
conditions could be executed as index seeks, which is about as efficient as it gets. Your code as written also reads nearly exactly as your requirements reads. That is a good thing. There might be some other trick to accomplish the same effect with an aggregated subquery, but I do not think that would match the performance, and the readability (and maintainability) would suffer. – T N Commented Mar 11 at 1:57