最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Clean up a `CHAR` column and cast it to a `NUMBER` column - Stack Overflow

programmeradmin2浏览0评论

I need to cast a CHAR column to a NUMBER column and ADD it back to the table but:

  • The decimal separator is is either a dot or a coma. No thousands separator is used so e.g. both 1,000 and 1.000 are 1, not 1000.
  • and some values need to be edited before casting.

I need a "pure" SQL solution (i.e. no PL/SQL) because I can only access the database through R using the {DBI} and {odbc} packages.

Reproducible example

I am not the database administrator, so I cannot change the table definition to use VARCHAR2.

CREATE TABLE mytab (
 txt CHAR(6)
);

INSERT INTO mytab VALUES ('1');
INSERT INTO mytab VALUES ('0.1'); -- dot separator
INSERT INTO mytab VALUES ('0,1'); -- comma separator
INSERT INTO mytab VALUES ('two'); -- must be change to 2

As expected with my session parameter, I cannot cast 0.1 as is (nor two!):

SELECT value
FROM nls_session_parameters
WHERE parameter = 'NLS_NUMERIC_CHARACTERS';
-- returns ', '

SELECT VALIDATE_CONVERSION(txt AS NUMBER) AS is_num
FROM mytab;
-- returns 1 0 1 0

Approach 1

SELECTing with NLS_NUMERIC_CHARACTERS set as appropriate works (with RTRIM() as per @PaulW suggestion):

SELECT
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(RTRIM(txt), '9D9', 'NLS_NUMERIC_CHARACTERS=''.,''')
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END AS num
FROM mytab;
-- returns 1,0 0,1 0,1 2,0

However, it fails with ORA-01722: invalid number if I ADD the column back to the table:

ALTER TABLE mytab
ADD num AS (
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(RTRIM(txt), '9D9', 'NLS_NUMERIC_CHARACTERS=''.,''')
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END
);

SELECT num FROM mytab;
-- fails with ORA-01722

ALTER TABLE mytab DROP COLUMN num;

Approach 2

Similarly, SELECTing with REPLACE(..., '.', ',') were appropriate works:

SELECT
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(REPLACE(txt, '.', ','))
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END AS num
FROM mytab;
-- returns 1,0 0,1 0,1 2,0

However, it fails again with ORA-01722 if I ADD the column back to the table:

ALTER TABLE mytab
ADD num AS (
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(REPLACE(txt, '.', ','))
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END
);

SELECT num FROM mytab;
-- fails with ORA-01722

ALTER TABLE mytab DROP COLUMN num;

Approach 3

I also tried to do it in two steps with a temp column but it fails with ORA-54012: virtual column is referenced in a column expression:

ALTER TABLE mytab
ADD temp AS (
  REPLACE(txt, '.', ',')
);

ALTER TABLE mytab
ADD num AS (
  CASE
    WHEN temp = 'two'
    THEN 2
    ELSE TO_NUMBER(temp)
  END
);
-- fails with ORA-54012

Approach 4

Similarly, UPDATEing with REPLACE(..., '.', ',') before ADDing the column fails again with ORA-01722:

UPDATE mytab
SET txt = REPLACE(txt, '.', ',');

ALTER TABLE mytab
ADD num AS (
  CASE
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END
);

SELECT num FROM mytab;
-- fails with ORA-01722

ALTER TABLE mytab DROP COLUMN num;

Questions

  • What is the best approach to cast the CHAR column to a NUMBER column and ADD it to the table? (in pure SQL and without change to the table definition)
  • Why does approach 1 and 2 work when SELECTing but not when ADDing the column back to the table?
  • Why does approach 1, 2, and 4 fail? Trying to understand where the issue comes from puzzles me even more:
-- approach 1
ALTER TABLE mytab
ADD num1 AS (
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(RTRIM(txt) DEFAULT NULL ON CONVERSION ERROR, '9D9', 'NLS_NUMERIC_CHARACTERS=''.,''')
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt DEFAULT NULL ON CONVERSION ERROR)
  END
);

SELECT num1 FROM mytab;
-- returns 1 0,1 NULL 2

-- approach 2
ALTER TABLE mytab
ADD num2 AS (
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(REPLACE(txt, '.', ',') DEFAULT NULL ON CONVERSION ERROR)
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt DEFAULT NULL ON CONVERSION ERROR)
  END
);

SELECT num2 FROM mytab;
-- returns 1 NULL NULL 2

-- approach 4
UPDATE mytab
SET txt = REPLACE(txt, '.', ',');

ALTER TABLE mytab
ADD num4 AS (
  CASE
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END
);

SELECT num4 FROM mytab;
-- 1 NULL NULL 2

I need to cast a CHAR column to a NUMBER column and ADD it back to the table but:

  • The decimal separator is is either a dot or a coma. No thousands separator is used so e.g. both 1,000 and 1.000 are 1, not 1000.
  • and some values need to be edited before casting.

I need a "pure" SQL solution (i.e. no PL/SQL) because I can only access the database through R using the {DBI} and {odbc} packages.

Reproducible example

I am not the database administrator, so I cannot change the table definition to use VARCHAR2.

CREATE TABLE mytab (
 txt CHAR(6)
);

INSERT INTO mytab VALUES ('1');
INSERT INTO mytab VALUES ('0.1'); -- dot separator
INSERT INTO mytab VALUES ('0,1'); -- comma separator
INSERT INTO mytab VALUES ('two'); -- must be change to 2

As expected with my session parameter, I cannot cast 0.1 as is (nor two!):

SELECT value
FROM nls_session_parameters
WHERE parameter = 'NLS_NUMERIC_CHARACTERS';
-- returns ', '

SELECT VALIDATE_CONVERSION(txt AS NUMBER) AS is_num
FROM mytab;
-- returns 1 0 1 0

Approach 1

SELECTing with NLS_NUMERIC_CHARACTERS set as appropriate works (with RTRIM() as per @PaulW suggestion):

SELECT
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(RTRIM(txt), '9D9', 'NLS_NUMERIC_CHARACTERS=''.,''')
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END AS num
FROM mytab;
-- returns 1,0 0,1 0,1 2,0

However, it fails with ORA-01722: invalid number if I ADD the column back to the table:

ALTER TABLE mytab
ADD num AS (
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(RTRIM(txt), '9D9', 'NLS_NUMERIC_CHARACTERS=''.,''')
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END
);

SELECT num FROM mytab;
-- fails with ORA-01722

ALTER TABLE mytab DROP COLUMN num;

Approach 2

Similarly, SELECTing with REPLACE(..., '.', ',') were appropriate works:

SELECT
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(REPLACE(txt, '.', ','))
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END AS num
FROM mytab;
-- returns 1,0 0,1 0,1 2,0

However, it fails again with ORA-01722 if I ADD the column back to the table:

ALTER TABLE mytab
ADD num AS (
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(REPLACE(txt, '.', ','))
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END
);

SELECT num FROM mytab;
-- fails with ORA-01722

ALTER TABLE mytab DROP COLUMN num;

Approach 3

I also tried to do it in two steps with a temp column but it fails with ORA-54012: virtual column is referenced in a column expression:

ALTER TABLE mytab
ADD temp AS (
  REPLACE(txt, '.', ',')
);

ALTER TABLE mytab
ADD num AS (
  CASE
    WHEN temp = 'two'
    THEN 2
    ELSE TO_NUMBER(temp)
  END
);
-- fails with ORA-54012

Approach 4

Similarly, UPDATEing with REPLACE(..., '.', ',') before ADDing the column fails again with ORA-01722:

UPDATE mytab
SET txt = REPLACE(txt, '.', ',');

ALTER TABLE mytab
ADD num AS (
  CASE
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END
);

SELECT num FROM mytab;
-- fails with ORA-01722

ALTER TABLE mytab DROP COLUMN num;

Questions

  • What is the best approach to cast the CHAR column to a NUMBER column and ADD it to the table? (in pure SQL and without change to the table definition)
  • Why does approach 1 and 2 work when SELECTing but not when ADDing the column back to the table?
  • Why does approach 1, 2, and 4 fail? Trying to understand where the issue comes from puzzles me even more:
-- approach 1
ALTER TABLE mytab
ADD num1 AS (
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(RTRIM(txt) DEFAULT NULL ON CONVERSION ERROR, '9D9', 'NLS_NUMERIC_CHARACTERS=''.,''')
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt DEFAULT NULL ON CONVERSION ERROR)
  END
);

SELECT num1 FROM mytab;
-- returns 1 0,1 NULL 2

-- approach 2
ALTER TABLE mytab
ADD num2 AS (
  CASE
    WHEN txt = '0.1'
    THEN TO_NUMBER(REPLACE(txt, '.', ',') DEFAULT NULL ON CONVERSION ERROR)
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt DEFAULT NULL ON CONVERSION ERROR)
  END
);

SELECT num2 FROM mytab;
-- returns 1 NULL NULL 2

-- approach 4
UPDATE mytab
SET txt = REPLACE(txt, '.', ',');

ALTER TABLE mytab
ADD num4 AS (
  CASE
    WHEN txt = 'two'
    THEN 2
    ELSE TO_NUMBER(txt)
  END
);

SELECT num4 FROM mytab;
-- 1 NULL NULL 2
Share Improve this question edited Mar 22 at 13:42 Thomas asked Mar 19 at 23:28 ThomasThomas 5092 silver badges13 bronze badges 7
  • 1 Updated my answer – Paul W Commented Mar 20 at 1:24
  • @PaulW Thanks! I edited my question to better demonstrate my use case (some values need to be edited before casting). Also I am not the database administrator, so I cannot change the table definition. How can I combine all this? (i.e. edit values and cast to number) – Thomas Commented Mar 20 at 8:35
  • Yikes, you've added "two" to your list of possible values. That changes things. If your data is that unclean, I suggest you write a function in PL/SQL that goes through whatever complexities you might have to deal with. Make sure you have an exception handler to return NULL if an exception is raised, and then call that function from SQL (or virtual col definition). That'll be easier to manage than trying to do too much in pure SQL and control execution order. – Paul W Commented Mar 20 at 12:39
  • If your data is that bad, then how will you know whether 1,000 is 1000 or 1? – Thorsten Kettner Commented Mar 21 at 8:27
  • @ThorstenKettner No thousands separator is used. The separator is only used for the decimal separator. – Thomas Commented Mar 21 at 9:31
 |  Show 2 more comments

1 Answer 1

Reset to default 3

Your example is using the CHAR datatype instead of the more sensible VARCHAR2 datatype. CHAR will right-pad the inserted values with spaces. You probably need to RTRIM() the column before attempting any numeric conversion. My database expects the . as a decimal point, but not , - so in my case, I translate any , into a . and with the RTRIM it should be good to go.

SELECT TO_NUMBER(REPLACE(RTRIM(txt),',','.') DEFAULT NULL ON CONVERSION ERROR) 
  FROM mytab;

1
0.1
0.1

ALTER TABLE mytab
ADD num AS (
  TO_NUMBER(REPLACE(RTRIM(txt),',','.') DEFAULT NULL ON CONVERSION ERROR)
);

SELECT num FROM mytab;

1
0.1
0.1

Also, CASE (and DECODE) may evaluate the THEN clause before testing the WHEN, so it is not safe to have a conversion in a CASE branch that can throw an exception on rows in the table, even if those rows wouldn't end up using the resolved value of that CASE branch. SQL isn't a programming language - Oracle is free to rewrite and reorder how it implements the declarative request that SQL represents. So it's better to combine the conditions into a single expression using nested functions as shown above, so that the conversion isn't attempted until all the possible problems are cleaned up no matter which row it's working on (as it has to work from the inner function outward).

But I think your main problem is the use of CHAR and failure to trim those spaces.

发布评论

评论列表(0)

  1. 暂无评论