Spark 3.5+ has inbuilt function unix_micros()
which returns the number of microseconds since 1970-01-01 00:00:00 UTC.
I am working on a codebase where I have this use-case but it is using spark 3.4 so this method is not present.
So I wrote my own implementation of this method as:
def unix_micros(column: Column): Column {
val columnName = column.toString
unix_timestamp(column) * 1000000 + expr(s"CAST(date_format($columnName, 'SSSSSS') AS BIGINT)")
}
It works just fine in my machine and some clusters. But in some clusters, it returns 000123 instead of 123456 for the microsecond part. Similarly 000999 instead of 999678.
Since everything else matches,
it means
expr(s"CAST(date_format($columnName, 'SSSSSS') AS BIGINT)")
is behaving differently.
Any idea what am I missing here?
Spark 3.5+ has inbuilt function unix_micros()
which returns the number of microseconds since 1970-01-01 00:00:00 UTC.
I am working on a codebase where I have this use-case but it is using spark 3.4 so this method is not present.
So I wrote my own implementation of this method as:
def unix_micros(column: Column): Column {
val columnName = column.toString
unix_timestamp(column) * 1000000 + expr(s"CAST(date_format($columnName, 'SSSSSS') AS BIGINT)")
}
It works just fine in my machine and some clusters. But in some clusters, it returns 000123 instead of 123456 for the microsecond part. Similarly 000999 instead of 999678.
Since everything else matches,
it means
expr(s"CAST(date_format($columnName, 'SSSSSS') AS BIGINT)")
is behaving differently.
Any idea what am I missing here?
Share Improve this question edited Mar 15 at 12:58 Gaël J 15.6k5 gold badges22 silver badges45 bronze badges asked Mar 14 at 20:00 explorerexplorer 11 silver badge1 Answer
Reset to default 2Even though unix_micros
was not available in Spark 3.4 as a function, the underlying implementation was already available since 3.1.0.
You can implement it like this:
def unix_micros(e: Column): Column = withExpr {
UnixMicros(e.expr)
}
See also: https://github/apache/spark/pull/41463