sql - Compare entire rows of data based on a single matching column value

I have the following Postgres SQL table called sval: _db<>fiddle

create table sval(id,closing_price,opening_price,ISIN,closing_fx,market_cap)
as values(104, 55.3, 44, 'KKJJ102',0,0)
        ,(432, 99  , 77, 'JJII333',0,0)
        ,(444, 44  , 33, 'KKJJ102',0,0)
        ,(888, 33  , 41, 'JJEOD23',0,0)
        ,(422, 99  , 77, 'JJII333',0,0)
        ,(222, 33  , 41, 'JJEOD23',0,0);

What I am trying to do is query for matching ISINs, and then comparing the entire rows for the rows that have a matching ISIN.

So in the example above it would be comparing ids:

104 with 444,
432 with 422,
888 with 222.

And I want to return id's of the rows where the ISIN is matching, but the other column values are different. In the example above rows 104 and 444 would be returned, since their ISIN's match but the other values do not match.

However, rows 432 matches with 422 but their rows values are exactly the same. Same with 888 and 222.

It is guaranteed that there is ALWAYS exactly two ISINs that match.

I have the following Postgres SQL table called sval: _db<>fiddle

create table sval(id,closing_price,opening_price,ISIN,closing_fx,market_cap)
as values(104, 55.3, 44, 'KKJJ102',0,0)
        ,(432, 99  , 77, 'JJII333',0,0)
        ,(444, 44  , 33, 'KKJJ102',0,0)
        ,(888, 33  , 41, 'JJEOD23',0,0)
        ,(422, 99  , 77, 'JJII333',0,0)
        ,(222, 33  , 41, 'JJEOD23',0,0);

What I am trying to do is query for matching ISINs, and then comparing the entire rows for the rows that have a matching ISIN.

So in the example above it would be comparing ids:

104 with 444,
432 with 422,
888 with 222.

However, rows 432 matches with 422 but their rows values are exactly the same. Same with 888 and 222.

It is guaranteed that there is ALWAYS exactly two ISINs that match.

Share Improve this question edited Mar 12 at 10:24 Zegarek 27.4k5 gold badges24 silver badges30 bronze badges asked Mar 12 at 10:12 Patrick Chong 955 bronze badges

Self-join the table on ISIN, use to_jsonb(sval) to convert the whole row, then -'id' to get rid of the identifier and compare that. – Zegarek Commented Mar 12 at 10:17

Add a comment |

3 Answers 3

Sorted by: Reset to default 3

Self-join the table using(ISIN).
to_jsonb(sval) to convert the whole row.
-'id' to get rid of the identifier.
<> to keep rows that differ on any other column than the id you just got rid of.
Take only distinct ones, using least() and greatest() to get ordered pairs.

Nice thing about this is that you don't need to know or list the column names, which sort of emulates select * exclude(col) type of syntax.
_{demo at db<>fiddle}

select distinct 
       least(a.id, b.id)    as id1
     , greatest(a.id, b.id) as id2
from sval as a 
join sval as b using(isin)
where (to_jsonb(a)-'id')<>(to_jsonb(b)-'id');

id1	id2
104	444

You can also trade distinct for a different join condition that spits out sorted pairs:

select a.id, b.id
from sval as a 
join sval as b 
  on a.isin=b.isin
 and a.id<b.id --this makes sure lower one's on the left
 and (to_jsonb(a)-'id')<>(to_jsonb(b)-'id');

If you prefer to get them vertically instead of paired up:

select a.id, a.isin
from sval as a 
join sval as b using(isin)
where(to_jsonb(a)-'id')<>(to_jsonb(b)-'id');

Here are alternatives that also give you a diff:

select a.id, b.id, ldiff, rdiff
from sval as a 
join sval as b 
  on a.isin=b.isin
 and a.id<b.id
 and (to_jsonb(a)-'id')<>(to_jsonb(b)-'id')
cross join lateral
  (select jsonb_object_agg(k,v1) as ldiff
        , jsonb_object_agg(k,v2) as rdiff
    from jsonb_each(to_jsonb(a))a(k,v1)
    join jsonb_each(to_jsonb(b))b(k,v2)using(k)
    where v1<>v2
      and k<>'id') as d;

id	id	ldiff	rdiff
104	444	{"closing_price": 55.3, "opening_price": 44}	{"closing_price": 44, "opening_price": 33}
777	999	{"closing_price": 77, "opening_price": 77}	{"closing_price": 99, "opening_price": 99}

Vertical:

select a.id, a.isin, diff
from sval as a 
join sval as b using(isin)
cross join lateral
  (select jsonb_object_agg(k,v1) as diff
    from jsonb_each(to_jsonb(a))a(k,v1)
    join jsonb_each(to_jsonb(b))b(k,v2)using(k)
    where v1<>v2
      and k<>'id') as d
where(to_jsonb(a)-'id')<>(to_jsonb(b)-'id');

id	isin	diff
104	KKJJ102	{"closing_price": 55.3, "opening_price": 44}
444	KKJJ102	{"closing_price": 44, "opening_price": 33}
777	XXXXXXX	{"closing_price": 77, "opening_price": 77}
999	XXXXXXX	{"closing_price": 99, "opening_price": 99}

To achieve this in PostgreSQL, you can use a self-join to compare rows with the same ISIN and check if any of the other column values differ. Given that each ISIN always has exactly two corresponding rows, you can use an anti-join approach or an EXISTS clause to filter out pairs where all columns (except id and ISIN) match.

Here’s a query that returns the id values of rows where the ISIN matches, but at least one of the other column values is different:

You can use EXCEPT

WITH
d AS (
  SELECT closing_price, opening_price, ISIN, closing_fx, market_cap
  , ROW_NUMBER() OVER (PARTITION BY ISIN ORDER BY id) n
  FROM sval 
)
SELECT sval.*
FROM sval
JOIN 
(
  SELECT closing_price, opening_price, ISIN, closing_fx, market_cap
  FROM d
  WHERE n=1

  EXCEPT

  SELECT closing_price, opening_price, ISIN, closing_fx, market_cap
  FROM d
  WHERE n=2

) x ON x.ISIN = sval.ISIN

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

sql - Compare entire rows of data based on a single matching column value - Stack Overflow

3 Answers 3

与本文相关的文章

评论列表(0)