最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Requirement to change country names with 2 digit ISO code in a Column Value with Multiple Country Names - Stack Overflow

programmeradmin1浏览0评论

We have a requirement to convert a column value with multiple country names with semi-colon separated to replace these field values with 2 digit ISO code.

For example:

CountryList 
Denmark;Germany;United Kingdom

Should be turned into an output of:

Dk;DE;UK

We have achieved it using CONNECTBY (Normalization) and left outer join with country_code table then XMLAGGing the output. But we are looking for any simpler solution to achieve the same.

Thank you in advance.

We have a requirement to convert a column value with multiple country names with semi-colon separated to replace these field values with 2 digit ISO code.

For example:

CountryList 
Denmark;Germany;United Kingdom

Should be turned into an output of:

Dk;DE;UK

We have achieved it using CONNECTBY (Normalization) and left outer join with country_code table then XMLAGGing the output. But we are looking for any simpler solution to achieve the same.

Thank you in advance.

Share Improve this question edited Mar 28 at 10:22 marc_s 756k184 gold badges1.4k silver badges1.5k bronze badges asked Mar 28 at 10:17 Sanjay DubeySanjay Dubey 11 bronze badge 3
  • 3 Tough luck. Never, ever store data as semi-colon separated items. It will only cause you lots of trouble. – jarlh Commented Mar 28 at 10:28
  • 3 How can you get more simpler than what you're already doing? The only suggestion I could make is to use LISTAGG rather than XMLAGG. – Paul W Commented Mar 28 at 12:26
  • Pad with semicolons on front and back and then loop over all the countries applying string replacement (with semicolons as brackets). This will keep the original order. No joins or recursion necessary. You might want to avoid repeated updates to avoid lots of logging? – shawnt00 Commented Mar 30 at 3:17
Add a comment  | 

3 Answers 3

Reset to default 1

There is a possibility to do it without CONNECT BY (or a recursive cte) if you join your list of country names to the country codes table using Instr() function. This depends on consistency of your data.

WITH  --   S a m p l e    D a t a :
  country_list (country_names_list) AS
    ( Select 'Denmark;France;Germany;Spain;United Kingdom' 
      From Dual
    ),
--  S a m p l e    C o d e s    T a b l e :
  country_codes (country_code, country_name) AS
      ( Select 'DK',    'Denmark' From Dual UNION ALL 
        Select 'DE',    'Germany' From Dual UNION ALL 
        Select 'ES',    'Spain' From Dual UNION ALL 
        Select 'FR',    'France' From Dual UNION ALL 
        Select 'HR',    'Croatia' From Dual UNION ALL 
        Select 'UK',    'United Kingdom' From Dual
      )
--    S Q L : 
Select   LISTAGG(DISTINCT country_code, ';') WITHIN GROUP (Order By country_name) Over() as country_codes_list
From    ( Select      cc.country_code, cc.country_name, cl.country_names_list
          From        country_list cl
          Inner Join  country_codes cc ON( Instr(';' || cl.country_names_list || ';', ';' || cc.country_name || ';') > 0 )
       )
FETCH FIRST ROW ONLY

R e s u l t :

COUNTRY_CODES_LIST
DK;FR;DE;ES;UK

You, most probably, did something like below which I think is a better way to do it. Here I just used LISTAGG() instead of XMLAGG.

WITH  
  country_names (ID, country_name) AS
    ( Select  LEVEL as rn, 
              SubStr( ';' || cl.country_names_list || ';', 
                      Instr(';' || cl.country_names_list || ';', ';', 1, LEVEL) + 1, 
                      Instr(';' || cl.country_names_list || ';', ';', 1, LEVEL + 1) 
                    - Instr(';' || cl.country_names_list || ';', ';', 1, LEVEL) - 1
                    ) as country_name
      From   country_list cl
      Connect By LEVEL <= Length(cl.country_names_list) - Length(Replace(cl.country_names_list, ';', '')) + 1
    )
Select      LISTAGG(cc.country_code, ';') WITHIN GROUP (Order By cn.ID) Over() as country_codes_list
From        country_names cn
Inner Join  country_codes cc ON(cc.country_name = cn.country_name)
FETCH FIRST ROW ONLY

R e s u l t :

COUNTRY_CODES_LIST
DK;FR;DE;ES;UK

fiddle

I suggest use JSON_TABLE and LISTAGG.

First, covert country list to JSON format

Denmark;Germany;United Kingdom
to
{"countries":["Denmark","Germany","United Kingdom"]}

Then expand array by JSON_TABLE.

Finally JOIN country_codes table and aggregate back by LISTAGG

See example:

CREATE TABLE country_codes (country_name,A2,A3,ISO_Num) AS (
          SELECT 'Denmark'       ,'DK','DNK',208 from dual
UNION ALL SELECT 'Germany'       ,'DE','DEU',276 from dual
UNION ALL SELECT 'United Kingdom','GB','GBR',826 from dual
);
COUNTRY_NAME A2 A3 ISO_NUM
Denmark DK DNK 208
Germany DE DEU 276
United Kingdom GB GBR 826

Test data source

ID COUNTRY_LIST
1 Denmark;Germany;United Kingdom

Query

select t.*, j.* 
from test t
cross join lateral(
   select listagg(cc.a2,';')within group (order by rn) country_codes
   from json_table(concat('{"countries":["',replace(country_list,';','","'),'"]}')
           ,'$.countries[*]'
            columns (rn for ordinality,
            cn varchar2(50) path '$' ) 
    ) jt
   inner join country_codes cc on cc.country_name=jt.cn
 )j

Output is

ID COUNTRY_LIST country_codes
1 Denmark;Germany;United Kingdom DK;DE;GB

For interest, before aggregate we have

ID COUNTRY_LIST RN CN COUNTRY_NAME A2 A3 ISO_NUM
1 Denmark;Germany;United Kingdom 1 Denmark Denmark DK DNK 208
1 Denmark;Germany;United Kingdom 2 Germany Germany DE DEU 276
1 Denmark;Germany;United Kingdom 3 United Kingdom United Kingdom GB GBR 826

fiddle

Mine is similar to the one above. It's about verticalising a semicolon delimited list, then using a table (taken from the internet here ....
https://www.cia.gov/the-world-factbook/references/country-data-codes/ )

... to perform the Country Name to abbreviation conversion.

The ISO standard still says 'GB' to Great Britain, but the file you find in the link above at least has a column internet with the value '.uk' for the United Kingdom, so I took that one.

In the example below, I just create a three-row table with some relevant data, as a cutout from the CSV file data of that file.

And I verticalise by using a series of integers - i(i) - a bit more than necessary - to cross join with, and searching for the i-th occurrence of any consecutive string that does not contain any semicolons, and filtering for that REGEXP_SUBSTR() to not be null.

And , finally use LISTAGG() to re-horizontalise.


CREATE TABLE ctry_dat_codes (nam,genc,iso_3166,stanag,internet) AS (
          SELECT 'Denmark'       ,'DNK','DK|DNK|208','DNK','.dk'
UNION ALL SELECT 'Germany'       ,'DEU','DE|DEU|276','DEU','.de'
UNION ALL SELECT 'United Kingdom','GBR','GB|GBR|826','GBR','.uk'
);

WITH
indata(ctrylist) AS (
  SELECT 'Denmark;Germany;United Kingdom' FROM dual
)
,
i(i) AS (
            SELECT 1 FROM dual
  UNION ALL SELECT 2 FROM dual
  UNION ALL SELECT 3 FROM dual
  UNION ALL SELECT 4 FROM dual
)
,
vertical_orig AS (
  SELECT
    i
  , REGEXP_SUBSTR(ctrylist,'[^;]+',1,i) AS ctry
  FROM indata CROSS JOIN i
  WHERE REGEXP_SUBSTR(ctrylist,'[^;]+',1,i) IS NOT NULL                                                                                                                      
)
,
vertical_chg AS (
  SELECT
    i
  , UPPER(SUBSTR(internet,2)) AS cc_internet
  FROM vertical_orig
  JOIN ctry_dat_codes ON ctry = nam
)
SELECT  
  LISTAGG(cc_internet,'; ') WITHIN GROUP (ORDER BY i) AS ctrylist
FROM vertical_chg;
CTRYLIST
DK; DE; UK

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论