最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

json - Can't Query AWS Athena Presto Table Because of Dash Character in Column name - Stack Overflow

programmeradmin2浏览0评论

I have a file in S3 with the following contents:

{"foo-bar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foo-bar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}

The Glue table definition is:

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe' 
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

I get null values for the "name" key when I query Athena like so:

select "foo-bar".name from planets_ion_2 

If I remove the dash from the top level key in the S3 json, from "foo-bar" to "foobar", it works fine: {"foobar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foobar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foobar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe' 
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

So I can't seem to escape the dash. I've tried a lot of things without success:

select `foo-bar`.name from planets_ion_2 
select `foo-bar.name` from planets_ion_2 

Any time I try to escape with backtick, I get the error Queries of this type are not supported

I'm not sure if this a Athena or Presto issue. But thanks in advance for any help.

I have a file in S3 with the following contents:

{"foo-bar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foo-bar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}

The Glue table definition is:

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe' 
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

I get null values for the "name" key when I query Athena like so:

select "foo-bar".name from planets_ion_2 

If I remove the dash from the top level key in the S3 json, from "foo-bar" to "foobar", it works fine: {"foobar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foobar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foobar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe' 
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

So I can't seem to escape the dash. I've tried a lot of things without success:

select `foo-bar`.name from planets_ion_2 
select `foo-bar.name` from planets_ion_2 

Any time I try to escape with backtick, I get the error Queries of this type are not supported

I'm not sure if this a Athena or Presto issue. But thanks in advance for any help.

Share Improve this question edited Mar 4 at 17:31 Samer A. asked Mar 3 at 17:01 Samer A.Samer A. 658 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

You need to add the following after ROW FORMAT SERDE

WITH SERDEPROPERTIES ("ion.foo-bar.path_extractor" = "('foo-bar')")

With the dash in the name it can't find the path to the data. Use the property path_extractor to create an alias for it with single quotes around the key name since it has a dash in it.

So the create table statement would look something like:

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES ("ion.foo-bar.path_extractor" = "('foo-bar')")
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

You should then be able to query name using

 select "foo-bar".name from from planets_ion_2

Side note - Amazon Ion Hive SerDe is used to query Amazon Ion format data, but can be used to query non-Amazon Ion JSON datasets.

See this note in the link

Because Amazon Ion is a superset of JSON, you can use the Amazon Ion Hive SerDe to query non-Amazon Ion JSON datasets. Unlike other JSON SerDe libraries, the Amazon Ion SerDe does not expect each row of data to be on a single line. This feature is useful if you want to query JSON datasets that are in "pretty print" format or otherwise break up the fields in a row with newline characters.

If you explore the other JSON SerDe libraries you would still need to account for the dash in the key name.

As an example using OpenX JSON SerDe you would have to provide the property like

WITH SERDEPROPERTIES ("mapping.foo-bar" = "foo-bar")

to "remap" that path to account for the dash.

    CREATE EXTERNAL TABLE `planets_ion_2`(
      `foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  '.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ("mapping.foo-bar" = "foo-bar")
    LOCATION
      's3://<some_bucket>/AthenaDataStore/Planets2'
发布评论

评论列表(0)

  1. 暂无评论