pandas - Pyspark - Can not merge type <class 'pyspark.sql.types.StringType'> and <class &am

I'm converting pandas dataframe to spark dataframe, but it is failing with

Can not merge type <class 'pyspark.sql.types.StringType'> and <class 'pyspark.sql.types.DoubleType'>

I can infer the schema and convert the types. But I have array type and I don't want to infer array type. Is there a way to infer particular column (Id) alone to double and remain other columns untouched.

 |-- Id: string (nullable = true)
 |-- Field: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- value: string (nullable = true)

I'm converting pandas dataframe to spark dataframe, but it is failing with

Can not merge type <class 'pyspark.sql.types.StringType'> and <class 'pyspark.sql.types.DoubleType'>

 |-- Id: string (nullable = true)
 |-- Field: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- value: string (nullable = true)

Share Improve this question edited Mar 9 at 8:36 asked Mar 9 at 8:22 Jim Macaulay 5,1995 gold badges32 silver badges59 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 0

Defining the type to ArrayType(MapType(StringType(), StringType())) resolved the issue

schema = StructType([
    StructField('Id', StringType(), True), \
    StructField('Field', ArrayType(MapType(StringType(), StringType())), True))]

Is there a way to infer particular column (Id) alone to double and remain other columns

You have to use DoubleType if you want Double like below

remaining will be same as you have done.

schema = StructType([
    StructField('Id', DoubleType(), True), \
    StructField('Field', ArrayType(MapType(StringType(), StringType())), True))]

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

pandas - Pyspark - Can not merge type <class 'pyspark.sql.types.StringType'> and <class &am

2 Answers 2

与本文相关的文章

评论列表(0)