I am writing output to a Azure Synapse table where the table contains a varbinary(8000) column. When writing using spark it gives error that UNSUPORTED_DATATYPE as I am trying to limit length from default MAX to 8000 as MAX is not supported by Azure Synapse. Below is the error
.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_DATATYPE] Unsupported data type "VARBINARY(8000)"
ds.write()
.format(FORMAT_JDBC)
.option("url", JDBC_URL)
.option("user", USER_VALUE)
.option("password", PASSWORD_VALUE)
.option(DRIVER_CLASS_NAME, "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.option(DB_TABLE, "\"ImageStore\"")
.option("createTableColumnTypes","ImageName varchar(8000),ImageData VARBINARY(8000)")
.mode(SaveMode.Overwrite)
.save();
Note : I don't need append to an pre-existing table and need to create a new Table. Also I don't want to create a Table of Heap type in Synapse but the default CCI table.
I am writing output to a Azure Synapse table where the table contains a varbinary(8000) column. When writing using spark it gives error that UNSUPORTED_DATATYPE as I am trying to limit length from default MAX to 8000 as MAX is not supported by Azure Synapse. Below is the error
.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_DATATYPE] Unsupported data type "VARBINARY(8000)"
ds.write()
.format(FORMAT_JDBC)
.option("url", JDBC_URL)
.option("user", USER_VALUE)
.option("password", PASSWORD_VALUE)
.option(DRIVER_CLASS_NAME, "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.option(DB_TABLE, "\"ImageStore\"")
.option("createTableColumnTypes","ImageName varchar(8000),ImageData VARBINARY(8000)")
.mode(SaveMode.Overwrite)
.save();
Note : I don't need append to an pre-existing table and need to create a new Table. Also I don't want to create a Table of Heap type in Synapse but the default CCI table.
Share Improve this question edited Mar 27 at 12:27 Nitish Sharma asked Mar 27 at 12:26 Nitish SharmaNitish Sharma 111 bronze badge 6 | Show 1 more comment1 Answer
Reset to default 0Regarding the ERROR when using the spark it is because spark does not directly define table schema. However, you can create the table first using SQL and then write data into it using Spark.
ERROR: .apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_DATATYPE] Unsupported data type "VARBINARY(8000)
As you mentioned that you are using the Synapse table and you want to limit the Varbinary(8000)
I have tried the below in the dedicated sql pool
CREATE TABLE dbo.ImageStore2 (
ImageName VARCHAR(8000),
ImageData VARBINARY(8000)
)
WITH (DISTRIBUTION = ROUND_ROBIN, HEAP);
Results:
As you are trying to create the table from spark and you mentioned that you
Note : I don't need append to an pre-existing table and need to create a new Table.
Write to Azure Synapse Dedicated SQL Pool:
Ingest large volumes of data into both Internal and External tables.
Supports the following DataFrame save modes:
- Append
- ErrorIfExists
- Ignore
- Overwrite
Reference: Azure Synapse Dedicated SQL Pool Connector for Apache Spark
varbinary
data type, far as I know, justbinary
. – Andrew Commented Mar 27 at 15:12