I have the following Postgres SQL table called sval
: db<>fiddle
create table sval(id,closing_price,opening_price,ISIN,closing_fx,market_cap)
as values(104, 55.3, 44, 'KKJJ102',0,0)
,(432, 99 , 77, 'JJII333',0,0)
,(444, 44 , 33, 'KKJJ102',0,0)
,(888, 33 , 41, 'JJEOD23',0,0)
,(422, 99 , 77, 'JJII333',0,0)
,(222, 33 , 41, 'JJEOD23',0,0);
What I am trying to do is query for matching ISINs, and then comparing the entire rows for the rows that have a matching ISIN.
So in the example above it would be comparing ids:
104
with444
,432
with422
,888
with222
.
And I want to return id
's of the rows where the ISIN is matching, but the other column values are different. In the example above rows 104
and 444
would be returned, since their ISIN's match but the other values do not match.
However, rows 432
matches with 422
but their rows values are exactly the same. Same with 888 and 222.
It is guaranteed that there is ALWAYS exactly two ISINs that match.
I have the following Postgres SQL table called sval
: db<>fiddle
create table sval(id,closing_price,opening_price,ISIN,closing_fx,market_cap)
as values(104, 55.3, 44, 'KKJJ102',0,0)
,(432, 99 , 77, 'JJII333',0,0)
,(444, 44 , 33, 'KKJJ102',0,0)
,(888, 33 , 41, 'JJEOD23',0,0)
,(422, 99 , 77, 'JJII333',0,0)
,(222, 33 , 41, 'JJEOD23',0,0);
What I am trying to do is query for matching ISINs, and then comparing the entire rows for the rows that have a matching ISIN.
So in the example above it would be comparing ids:
104
with444
,432
with422
,888
with222
.
And I want to return id
's of the rows where the ISIN is matching, but the other column values are different. In the example above rows 104
and 444
would be returned, since their ISIN's match but the other values do not match.
However, rows 432
matches with 422
but their rows values are exactly the same. Same with 888 and 222.
It is guaranteed that there is ALWAYS exactly two ISINs that match.
Share Improve this question edited Mar 12 at 10:24 Zegarek 27.4k5 gold badges24 silver badges30 bronze badges asked Mar 12 at 10:12 Patrick ChongPatrick Chong 955 bronze badges 1 |3 Answers
Reset to default 3- Self-join the table
using(ISIN)
. to_jsonb(sval)
to convert the whole row.-'id'
to get rid of the identifier.<>
to keep rows that differ on any other column than theid
you just got rid of.- Take only
distinct
ones, usingleast()
andgreatest()
to get ordered pairs.
Nice thing about this is that you don't need to know or list the column names, which sort of emulates select * exclude(col)
type of syntax.
demo at db<>fiddle
select distinct
least(a.id, b.id) as id1
, greatest(a.id, b.id) as id2
from sval as a
join sval as b using(isin)
where (to_jsonb(a)-'id')<>(to_jsonb(b)-'id');
id1 | id2 |
---|---|
104 | 444 |
You can also trade distinct
for a different join condition that spits out sorted pairs:
select a.id, b.id
from sval as a
join sval as b
on a.isin=b.isin
and a.id<b.id --this makes sure lower one's on the left
and (to_jsonb(a)-'id')<>(to_jsonb(b)-'id');
If you prefer to get them vertically instead of paired up:
select a.id, a.isin
from sval as a
join sval as b using(isin)
where(to_jsonb(a)-'id')<>(to_jsonb(b)-'id');
Here are alternatives that also give you a diff:
select a.id, b.id, ldiff, rdiff
from sval as a
join sval as b
on a.isin=b.isin
and a.id<b.id
and (to_jsonb(a)-'id')<>(to_jsonb(b)-'id')
cross join lateral
(select jsonb_object_agg(k,v1) as ldiff
, jsonb_object_agg(k,v2) as rdiff
from jsonb_each(to_jsonb(a))a(k,v1)
join jsonb_each(to_jsonb(b))b(k,v2)using(k)
where v1<>v2
and k<>'id') as d;
id | id | ldiff | rdiff |
---|---|---|---|
104 | 444 | {"closing_price": 55.3, "opening_price": 44} | {"closing_price": 44, "opening_price": 33} |
777 | 999 | {"closing_price": 77, "opening_price": 77} | {"closing_price": 99, "opening_price": 99} |
Vertical:
select a.id, a.isin, diff
from sval as a
join sval as b using(isin)
cross join lateral
(select jsonb_object_agg(k,v1) as diff
from jsonb_each(to_jsonb(a))a(k,v1)
join jsonb_each(to_jsonb(b))b(k,v2)using(k)
where v1<>v2
and k<>'id') as d
where(to_jsonb(a)-'id')<>(to_jsonb(b)-'id');
id | isin | diff |
---|---|---|
104 | KKJJ102 | {"closing_price": 55.3, "opening_price": 44} |
444 | KKJJ102 | {"closing_price": 44, "opening_price": 33} |
777 | XXXXXXX | {"closing_price": 77, "opening_price": 77} |
999 | XXXXXXX | {"closing_price": 99, "opening_price": 99} |
To achieve this in PostgreSQL, you can use a self-join to compare rows with the same ISIN and check if any of the other column values differ. Given that each ISIN always has exactly two corresponding rows, you can use an anti-join approach or an EXISTS clause to filter out pairs where all columns (except id and ISIN) match.
Here’s a query that returns the id values of rows where the ISIN
matches, but at least one of the other column values is different:
You can use EXCEPT
WITH
d AS (
SELECT closing_price, opening_price, ISIN, closing_fx, market_cap
, ROW_NUMBER() OVER (PARTITION BY ISIN ORDER BY id) n
FROM sval
)
SELECT sval.*
FROM sval
JOIN
(
SELECT closing_price, opening_price, ISIN, closing_fx, market_cap
FROM d
WHERE n=1
EXCEPT
SELECT closing_price, opening_price, ISIN, closing_fx, market_cap
FROM d
WHERE n=2
) x ON x.ISIN = sval.ISIN
ISIN
, useto_jsonb(sval)
to convert the whole row, then-'id'
to get rid of the identifier and compare that. – Zegarek Commented Mar 12 at 10:17