最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Pyspark - Convert array type to string type for ARRAY<MAP<STRING, STRING>> type - Stack Overflow

programmeradmin3浏览0评论

I have a dataframe with one of the column with array type. I wanted to convert array type to string type. I'm trying to convert using concat_ws(","), but its not getting converted as it is ARRAY<MAP<STRING, STRING>> type

Dataframe

dataDictionary = [('value1', [{'key': 'Fruit', 'value': 'Apple'}, {'key': 'Colour', 'value': 'White'}]), 
                 ('value2', [{'key': 'Fruit', 'value': 'Mango'}, {'key': 'Bird', 'value': 'Eagle'}, {'key': 'Colour', 'value': 'Black'}])] 

df = spark.createDataFrame(data=dataDictionary)
df.withColumn("_2",concat_ws(",",col("_2")))
df.printSchema()

Schema

root
 |-- _1: string (nullable = true)
 |-- _2: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)

Error

AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "concat_ws(,, _2)" 
due to data type mismatch: Parameter 2 requires the ("ARRAY<STRING>" or "STRING") type, 
however "_2" has the type "ARRAY<MAP<STRING, STRING>>".;
'Project [_1#208, concat_ws(,, _2#209) AS _2#212]
+- LogicalRDD [_1#208, _2#209], false

Any suggestion would be appreciated

I have a dataframe with one of the column with array type. I wanted to convert array type to string type. I'm trying to convert using concat_ws(","), but its not getting converted as it is ARRAY<MAP<STRING, STRING>> type

Dataframe

dataDictionary = [('value1', [{'key': 'Fruit', 'value': 'Apple'}, {'key': 'Colour', 'value': 'White'}]), 
                 ('value2', [{'key': 'Fruit', 'value': 'Mango'}, {'key': 'Bird', 'value': 'Eagle'}, {'key': 'Colour', 'value': 'Black'}])] 

df = spark.createDataFrame(data=dataDictionary)
df.withColumn("_2",concat_ws(",",col("_2")))
df.printSchema()

Schema

root
 |-- _1: string (nullable = true)
 |-- _2: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)

Error

AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "concat_ws(,, _2)" 
due to data type mismatch: Parameter 2 requires the ("ARRAY<STRING>" or "STRING") type, 
however "_2" has the type "ARRAY<MAP<STRING, STRING>>".;
'Project [_1#208, concat_ws(,, _2#209) AS _2#212]
+- LogicalRDD [_1#208, _2#209], false

Any suggestion would be appreciated

Share Improve this question asked Feb 14 at 13:58 Jim MacaulayJim Macaulay 5,1555 gold badges32 silver badges59 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

If I understand correctly, you need to extract the keys and/or values from the map first. Perhaps something like:

from pyspark.sql.functions import concat_ws, col, expr

dataDictionary = [('value1', [{'key': 'Fruit', 'value': 'Apple'}, {'key': 'Colour', 'value': 'White'}]), 
                 ('value2', [{'key': 'Fruit', 'value': 'Mango'}, {'key': 'Bird', 'value': 'Eagle'}, {'key': 'Colour', 'value': 'Black'}])] 

df = spark.createDataFrame(dataDictionary)

df.printSchema()
display(df)

df = df.withColumn("_2", concat_ws(",", expr("transform(_2, x -> concat_ws(':', x.key, x.value))")))
df.printSchema()
display(df)

发布评论

评论列表(0)

  1. 暂无评论