最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Compare entire rows of data based on a single matching column value - Stack Overflow

programmeradmin3浏览0评论

I have the following Postgres SQL table called sval: db<>fiddle

create table sval(id,closing_price,opening_price,ISIN,closing_fx,market_cap)
as values(104, 55.3, 44, 'KKJJ102',0,0)
        ,(432, 99  , 77, 'JJII333',0,0)
        ,(444, 44  , 33, 'KKJJ102',0,0)
        ,(888, 33  , 41, 'JJEOD23',0,0)
        ,(422, 99  , 77, 'JJII333',0,0)
        ,(222, 33  , 41, 'JJEOD23',0,0);

What I am trying to do is query for matching ISINs, and then comparing the entire rows for the rows that have a matching ISIN.

So in the example above it would be comparing ids:

  • 104 with 444,
  • 432 with 422,
  • 888 with 222.

And I want to return id's of the rows where the ISIN is matching, but the other column values are different. In the example above rows 104 and 444 would be returned, since their ISIN's match but the other values do not match.

However, rows 432 matches with 422 but their rows values are exactly the same. Same with 888 and 222.

It is guaranteed that there is ALWAYS exactly two ISINs that match.

I have the following Postgres SQL table called sval: db<>fiddle

create table sval(id,closing_price,opening_price,ISIN,closing_fx,market_cap)
as values(104, 55.3, 44, 'KKJJ102',0,0)
        ,(432, 99  , 77, 'JJII333',0,0)
        ,(444, 44  , 33, 'KKJJ102',0,0)
        ,(888, 33  , 41, 'JJEOD23',0,0)
        ,(422, 99  , 77, 'JJII333',0,0)
        ,(222, 33  , 41, 'JJEOD23',0,0);

What I am trying to do is query for matching ISINs, and then comparing the entire rows for the rows that have a matching ISIN.

So in the example above it would be comparing ids:

  • 104 with 444,
  • 432 with 422,
  • 888 with 222.

And I want to return id's of the rows where the ISIN is matching, but the other column values are different. In the example above rows 104 and 444 would be returned, since their ISIN's match but the other values do not match.

However, rows 432 matches with 422 but their rows values are exactly the same. Same with 888 and 222.

It is guaranteed that there is ALWAYS exactly two ISINs that match.

Share Improve this question edited Mar 12 at 10:24 Zegarek 27.4k5 gold badges24 silver badges30 bronze badges asked Mar 12 at 10:12 Patrick ChongPatrick Chong 955 bronze badges 1
  • Self-join the table on ISIN, use to_jsonb(sval) to convert the whole row, then -'id' to get rid of the identifier and compare that. – Zegarek Commented Mar 12 at 10:17
Add a comment  | 

3 Answers 3

Reset to default 3
  1. Self-join the table using(ISIN).
  2. to_jsonb(sval) to convert the whole row.
  3. -'id' to get rid of the identifier.
  4. <> to keep rows that differ on any other column than the id you just got rid of.
  5. Take only distinct ones, using least() and greatest() to get ordered pairs.

Nice thing about this is that you don't need to know or list the column names, which sort of emulates select * exclude(col) type of syntax.
demo at db<>fiddle

select distinct 
       least(a.id, b.id)    as id1
     , greatest(a.id, b.id) as id2
from sval as a 
join sval as b using(isin)
where (to_jsonb(a)-'id')<>(to_jsonb(b)-'id');
id1 id2
104 444

You can also trade distinct for a different join condition that spits out sorted pairs:

select a.id, b.id
from sval as a 
join sval as b 
  on a.isin=b.isin
 and a.id<b.id --this makes sure lower one's on the left
 and (to_jsonb(a)-'id')<>(to_jsonb(b)-'id');

If you prefer to get them vertically instead of paired up:

select a.id, a.isin
from sval as a 
join sval as b using(isin)
where(to_jsonb(a)-'id')<>(to_jsonb(b)-'id');

Here are alternatives that also give you a diff:

select a.id, b.id, ldiff, rdiff
from sval as a 
join sval as b 
  on a.isin=b.isin
 and a.id<b.id
 and (to_jsonb(a)-'id')<>(to_jsonb(b)-'id')
cross join lateral
  (select jsonb_object_agg(k,v1) as ldiff
        , jsonb_object_agg(k,v2) as rdiff
    from jsonb_each(to_jsonb(a))a(k,v1)
    join jsonb_each(to_jsonb(b))b(k,v2)using(k)
    where v1<>v2
      and k<>'id') as d;
id id ldiff rdiff
104 444 {"closing_price": 55.3, "opening_price": 44} {"closing_price": 44, "opening_price": 33}
777 999 {"closing_price": 77, "opening_price": 77} {"closing_price": 99, "opening_price": 99}

Vertical:

select a.id, a.isin, diff
from sval as a 
join sval as b using(isin)
cross join lateral
  (select jsonb_object_agg(k,v1) as diff
    from jsonb_each(to_jsonb(a))a(k,v1)
    join jsonb_each(to_jsonb(b))b(k,v2)using(k)
    where v1<>v2
      and k<>'id') as d
where(to_jsonb(a)-'id')<>(to_jsonb(b)-'id');
id isin diff
104 KKJJ102 {"closing_price": 55.3, "opening_price": 44}
444 KKJJ102 {"closing_price": 44, "opening_price": 33}
777 XXXXXXX {"closing_price": 77, "opening_price": 77}
999 XXXXXXX {"closing_price": 99, "opening_price": 99}

To achieve this in PostgreSQL, you can use a self-join to compare rows with the same ISIN and check if any of the other column values differ. Given that each ISIN always has exactly two corresponding rows, you can use an anti-join approach or an EXISTS clause to filter out pairs where all columns (except id and ISIN) match.

Here’s a query that returns the id values of rows where the ISIN matches, but at least one of the other column values is different:

You can use EXCEPT

WITH
d AS (
  SELECT closing_price, opening_price, ISIN, closing_fx, market_cap
  , ROW_NUMBER() OVER (PARTITION BY ISIN ORDER BY id) n
  FROM sval 
)
SELECT sval.*
FROM sval
JOIN 
(
  SELECT closing_price, opening_price, ISIN, closing_fx, market_cap
  FROM d
  WHERE n=1

  EXCEPT

  SELECT closing_price, opening_price, ISIN, closing_fx, market_cap
  FROM d
  WHERE n=2

) x ON x.ISIN = sval.ISIN
发布评论

评论列表(0)

  1. 暂无评论