I'm converting pandas dataframe to spark dataframe, but it is failing with
Can not merge type <class 'pyspark.sql.types.StringType'> and <class 'pyspark.sql.types.DoubleType'>
I can infer the schema and convert the types. But I have array type and I don't want to infer array type. Is there a way to infer particular column (Id) alone to double and remain other columns untouched.
|-- Id: string (nullable = true)
|-- Field: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- key: string (nullable = true)
| | |-- value: string (nullable = true)
I'm converting pandas dataframe to spark dataframe, but it is failing with
Can not merge type <class 'pyspark.sql.types.StringType'> and <class 'pyspark.sql.types.DoubleType'>
I can infer the schema and convert the types. But I have array type and I don't want to infer array type. Is there a way to infer particular column (Id) alone to double and remain other columns untouched.
|-- Id: string (nullable = true)
|-- Field: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- key: string (nullable = true)
| | |-- value: string (nullable = true)
Share
Improve this question
edited Mar 9 at 8:36
Jim Macaulay
asked Mar 9 at 8:22
Jim MacaulayJim Macaulay
5,1995 gold badges32 silver badges59 bronze badges
2 Answers
Reset to default 0Defining the type to ArrayType(MapType(StringType(), StringType())) resolved the issue
schema = StructType([
StructField('Id', StringType(), True), \
StructField('Field', ArrayType(MapType(StringType(), StringType())), True))]
Is there a way to infer particular column (Id) alone to double and remain other columns
You have to use DoubleType if you want Double like below
remaining will be same as you have done.
schema = StructType([
StructField('Id', DoubleType(), True), \
StructField('Field', ArrayType(MapType(StringType(), StringType())), True))]