I have a dataframe with one of the column with array type. I wanted to convert array type to string type. I'm trying to convert using concat_ws(","), but its not getting converted as it is ARRAY<MAP<STRING, STRING>> type
Dataframe
dataDictionary = [('value1', [{'key': 'Fruit', 'value': 'Apple'}, {'key': 'Colour', 'value': 'White'}]),
('value2', [{'key': 'Fruit', 'value': 'Mango'}, {'key': 'Bird', 'value': 'Eagle'}, {'key': 'Colour', 'value': 'Black'}])]
df = spark.createDataFrame(data=dataDictionary)
df.withColumn("_2",concat_ws(",",col("_2")))
df.printSchema()
Schema
root
|-- _1: string (nullable = true)
|-- _2: array (nullable = true)
| |-- element: map (containsNull = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)
Error
AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "concat_ws(,, _2)"
due to data type mismatch: Parameter 2 requires the ("ARRAY<STRING>" or "STRING") type,
however "_2" has the type "ARRAY<MAP<STRING, STRING>>".;
'Project [_1#208, concat_ws(,, _2#209) AS _2#212]
+- LogicalRDD [_1#208, _2#209], false
Any suggestion would be appreciated
I have a dataframe with one of the column with array type. I wanted to convert array type to string type. I'm trying to convert using concat_ws(","), but its not getting converted as it is ARRAY<MAP<STRING, STRING>> type
Dataframe
dataDictionary = [('value1', [{'key': 'Fruit', 'value': 'Apple'}, {'key': 'Colour', 'value': 'White'}]),
('value2', [{'key': 'Fruit', 'value': 'Mango'}, {'key': 'Bird', 'value': 'Eagle'}, {'key': 'Colour', 'value': 'Black'}])]
df = spark.createDataFrame(data=dataDictionary)
df.withColumn("_2",concat_ws(",",col("_2")))
df.printSchema()
Schema
root
|-- _1: string (nullable = true)
|-- _2: array (nullable = true)
| |-- element: map (containsNull = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)
Error
AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "concat_ws(,, _2)"
due to data type mismatch: Parameter 2 requires the ("ARRAY<STRING>" or "STRING") type,
however "_2" has the type "ARRAY<MAP<STRING, STRING>>".;
'Project [_1#208, concat_ws(,, _2#209) AS _2#212]
+- LogicalRDD [_1#208, _2#209], false
Any suggestion would be appreciated
Share Improve this question asked Feb 14 at 13:58 Jim MacaulayJim Macaulay 5,1555 gold badges32 silver badges59 bronze badges1 Answer
Reset to default 0If I understand correctly, you need to extract the keys and/or values from the map first. Perhaps something like:
from pyspark.sql.functions import concat_ws, col, expr
dataDictionary = [('value1', [{'key': 'Fruit', 'value': 'Apple'}, {'key': 'Colour', 'value': 'White'}]),
('value2', [{'key': 'Fruit', 'value': 'Mango'}, {'key': 'Bird', 'value': 'Eagle'}, {'key': 'Colour', 'value': 'Black'}])]
df = spark.createDataFrame(dataDictionary)
df.printSchema()
display(df)
df = df.withColumn("_2", concat_ws(",", expr("transform(_2, x -> concat_ws(':', x.key, x.value))")))
df.printSchema()
display(df)