java - Spark giving error when writing a limited length Column of type Varbinary for Synapse database

I am writing output to a Azure Synapse table where the table contains a varbinary(8000) column. When writing using spark it gives error that UNSUPORTED_DATATYPE as I am trying to limit length from default MAX to 8000 as MAX is not supported by Azure Synapse. Below is the error

.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_DATATYPE] Unsupported data type "VARBINARY(8000)"

ds.write()
            .format(FORMAT_JDBC)
            .option("url", JDBC_URL)
            .option("user", USER_VALUE)
            .option("password", PASSWORD_VALUE)
            .option(DRIVER_CLASS_NAME, "com.microsoft.sqlserver.jdbc.SQLServerDriver")
            .option(DB_TABLE, "\"ImageStore\"")
            .option("createTableColumnTypes","ImageName varchar(8000),ImageData VARBINARY(8000)")
            .mode(SaveMode.Overwrite)
            .save();

Note : I don't need append to an pre-existing table and need to create a new Table. Also I don't want to create a Table of Heap type in Synapse but the default CCI table.

.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_DATATYPE] Unsupported data type "VARBINARY(8000)"

ds.write()
            .format(FORMAT_JDBC)
            .option("url", JDBC_URL)
            .option("user", USER_VALUE)
            .option("password", PASSWORD_VALUE)
            .option(DRIVER_CLASS_NAME, "com.microsoft.sqlserver.jdbc.SQLServerDriver")
            .option(DB_TABLE, "\"ImageStore\"")
            .option("createTableColumnTypes","ImageName varchar(8000),ImageData VARBINARY(8000)")
            .mode(SaveMode.Overwrite)
            .save();

Note : I don't need append to an pre-existing table and need to create a new Table. Also I don't want to create a Table of Heap type in Synapse but the default CCI table.

Share Improve this question edited Mar 27 at 12:27 asked Mar 27 at 12:26 Nitish Sharma 111 bronze badge

learn.microsoft/en-us/sql/t-sql/data-types/… Can you check the above link if it helps you? – Dileep Raj Narayan Thumula Commented Mar 27 at 13:00
learn.microsoft/en-us/sql/t-sql/functions/… Can you check the above? – Dileep Raj Narayan Thumula Commented Mar 27 at 13:01
No these links do not help as we need to define schema using spark sql and not cast/convert data here. – Nitish Sharma Commented Mar 27 at 14:03
can you use BINARY(8000) Instead – Dileep Raj Narayan Thumula Commented Mar 27 at 14:05
What is a a Synapse table? Databricks does not have a varbinary data type, far as I know, just binary. – Andrew Commented Mar 27 at 15:12

| Show 1 more comment

1 Answer 1

Sorted by: Reset to default 0

Regarding the ERROR when using the spark it is because spark does not directly define table schema. However, you can create the table first using SQL and then write data into it using Spark.

ERROR: .apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_DATATYPE] Unsupported data type "VARBINARY(8000)

As you mentioned that you are using the Synapse table and you want to limit the Varbinary(8000)

I have tried the below in the dedicated sql pool

CREATE TABLE dbo.ImageStore2 (
    ImageName VARCHAR(8000),
    ImageData VARBINARY(8000)
)
WITH (DISTRIBUTION = ROUND_ROBIN, HEAP);

Results:

As you are trying to create the table from spark and you mentioned that you

Note : I don't need append to an pre-existing table and need to create a new Table.

Write to Azure Synapse Dedicated SQL Pool:
Ingest large volumes of data into both Internal and External tables.
Supports the following DataFrame save modes:

Append
ErrorIfExists
Ignore
Overwrite

Reference: Azure Synapse Dedicated SQL Pool Connector for Apache Spark

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

java - Spark giving error when writing a limited length Column of type Varbinary for Synapse database - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)