最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Oracle timestamps for ETL delta loading - Stack Overflow

programmeradmin0浏览0评论

I'm running an ETL job against an Oracle database. We are passing epoch milliseconds, and the timestamp column in the source is US Eastern and naive.

My query is simple in theory:

SELECT *
FROM table
WHERE afterEpoch <= TIME_STAMP
AND <= beforeEpoch

But in reality, it has turned into this:

FROM table
WHERE {{ afterEpoch }} <= (
            CAST(
              FROM_TZ(
                CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York'
              ) AT TIME ZONE 'UTC' AS DATE
            ) - TO_DATE('1970-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
          ) * 86400
      AND (
            CAST(
              FROM_TZ(
                CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York'
              ) AT TIME ZONE 'UTC' AS DATE
            ) - TO_DATE('1970-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
          ) * 86400 <= {{ beforeEpoch }}

The problem here is that the CAST (x AS DATE) is truncating and causing a 5 hour gap (time zones am i right). The obvious next step would be to change that to a TIMESTAMP, but now I get a precision error: "the leading precision of the interval is too small".

This strikes me as a wild problem for one of the leading operational databases.

Meanwhile, this actually works:

TIME_STAMP + INTERVAL '5' HOUR  - TO_DATE('1970-01-01', 'YYYY-MM-DD')

Below is a simple query tracking all the transforms:

WITH base as (
    SELECT to_timestamp('2025-03-02 21:44:07','YYYY-MM-DD HH24:MI:SS') AS TIME_STAMP
    FROM dual)
SELECT
    TIME_STAMP AS original_time_stamp
    ,CAST(TIME_STAMP AS TIMESTAMP) AS cast_timestamp
    ,FROM_TZ(CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York') AS east_timezone
    ,(FROM_TZ(CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York') AT TIME ZONE 'UTC') AS utc_timezone
    ,CAST(FROM_TZ(CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York') AT TIME ZONE 'UTC' AS TIMESTAMP) AS utc_naive_timezone0
    ,(CAST(FROM_TZ(CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York') AT TIME ZONE 'UTC' AS TIMESTAMP) - TO_DATE('1970-01-01', 'YYYY-MM-DD')) * 86400 AS new_epoch_seconds
FROM
   base
WHERE
    ROWNUM = 1;

I was actually able to get something to work by just removing 5 hours from the timestamp, but I don't think that will fare well when we hit DST down the road.

I would love to hear some similar stories and some solutions to this approach.

EDIT: when I subtract two timestamps I get this -000020150 02:44:07.000000000 which looks like the the number of days, hours, etc...

I'm running an ETL job against an Oracle database. We are passing epoch milliseconds, and the timestamp column in the source is US Eastern and naive.

My query is simple in theory:

SELECT *
FROM table
WHERE afterEpoch <= TIME_STAMP
AND <= beforeEpoch

But in reality, it has turned into this:

FROM table
WHERE {{ afterEpoch }} <= (
            CAST(
              FROM_TZ(
                CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York'
              ) AT TIME ZONE 'UTC' AS DATE
            ) - TO_DATE('1970-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
          ) * 86400
      AND (
            CAST(
              FROM_TZ(
                CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York'
              ) AT TIME ZONE 'UTC' AS DATE
            ) - TO_DATE('1970-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
          ) * 86400 <= {{ beforeEpoch }}

The problem here is that the CAST (x AS DATE) is truncating and causing a 5 hour gap (time zones am i right). The obvious next step would be to change that to a TIMESTAMP, but now I get a precision error: "the leading precision of the interval is too small".

This strikes me as a wild problem for one of the leading operational databases.

Meanwhile, this actually works:

TIME_STAMP + INTERVAL '5' HOUR  - TO_DATE('1970-01-01', 'YYYY-MM-DD')

Below is a simple query tracking all the transforms:

WITH base as (
    SELECT to_timestamp('2025-03-02 21:44:07','YYYY-MM-DD HH24:MI:SS') AS TIME_STAMP
    FROM dual)
SELECT
    TIME_STAMP AS original_time_stamp
    ,CAST(TIME_STAMP AS TIMESTAMP) AS cast_timestamp
    ,FROM_TZ(CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York') AS east_timezone
    ,(FROM_TZ(CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York') AT TIME ZONE 'UTC') AS utc_timezone
    ,CAST(FROM_TZ(CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York') AT TIME ZONE 'UTC' AS TIMESTAMP) AS utc_naive_timezone0
    ,(CAST(FROM_TZ(CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York') AT TIME ZONE 'UTC' AS TIMESTAMP) - TO_DATE('1970-01-01', 'YYYY-MM-DD')) * 86400 AS new_epoch_seconds
FROM
   base
WHERE
    ROWNUM = 1;

I was actually able to get something to work by just removing 5 hours from the timestamp, but I don't think that will fare well when we hit DST down the road.

I would love to hear some similar stories and some solutions to this approach.

EDIT: when I subtract two timestamps I get this -000020150 02:44:07.000000000 which looks like the the number of days, hours, etc...

Share Improve this question edited Mar 4 at 2:00 Robert Riley asked Mar 4 at 1:31 Robert RileyRobert Riley 4061 gold badge14 silver badges38 bronze badges 1
  • Respectively, if you mix and match epochs in milliseconds with timestamps that include time zones, you are always going to up against some obstacles, because (at a minimum) you're going to have to convert to UTC, and use EXTRACT to get to milliseconds. The reason date/timestamp datatypes were invented was to avoid the need for epoch style date handling – Connor McDonald Commented Mar 4 at 5:01
Add a comment  | 

2 Answers 2

Reset to default 2

Don't convert the timestamp to an epoch, convert the epoch to a timestamp:

SELECT *
FROM   table_name
WHERE  TIME_STAMP
       BETWEEN CAST(
                 ( TIMESTAMP '1970-01-01 00:00:00.000 UTC'
                   + beforeEpoch * INTERVAL '0.001' SECOND 
                 ) AT TIME ZONE 'America/New_York'
                 AS TIMESTAMP
               )
       AND     CAST(
                 ( TIMESTAMP '1970-01-01 00:00:00.000 UTC'
                   + afterEpoch * INTERVAL '0.001' SECOND
                 ) AT TIME ZONE 'America/New_York'
                 AS TIMESTAMP
               )

Which, for the sample data:

CREATE TABLE table_name( time_stamp ) AS
SELECT TIMESTAMP '2025-03-04 06:50:00' FROM DUAL UNION ALL
SELECT TIMESTAMP '2025-03-04 07:00:00' FROM DUAL UNION ALL
SELECT TIMESTAMP '2025-03-04 07:10:00' FROM DUAL;

Then if beforeEpoch and afterEpoch are 1741089300000 and 1741089900000 (2025-03-04 06:55:00 America/New_York and 2025-03-04 07:05:00 America/New_York, respectively) then the output is:

TIME_STAMP
2025-03-04 07:00:00.000000

fiddle

My answer is not what I like, but it appears to work.

with base as (
    SELECT to_timestamp('2025-03-02 21:44:07','YYYY-MM-DD HH24:MI:SS') AS TIME_STAMP
    FROM dual
),
timestamps AS (
    SELECT
        (CAST(FROM_TZ(CAST(TIME_STAMP AS TIMESTAMP), 'America/New_York') AT TIME ZONE 'UTC' AS TIMESTAMP) - TO_TIMESTAMP('1970-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')) as converted_ts
    FROM base),
delta_ts AS (
SELECT   extract ( day from converted_ts ) * 86400 +
         extract ( hour from converted_ts ) * 3600 +
         extract ( minute from converted_ts ) * 60 +
         extract ( second from converted_ts )
FROM timestamps
)
SELECT t.*
FROM delta_ts t

My overall solution was to create a CTE from the original table with the PK and the time_stamp and then rejoin after all the conversions. While I may be able to condense this, this looks like the most readable solution for the next engineer to have to deal with this haha.

发布评论

评论列表(0)

  1. 暂无评论