最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

java - Spark giving error when writing a limited length Column of type Varbinary for Synapse database - Stack Overflow

programmeradmin7浏览0评论

I am writing output to a Azure Synapse table where the table contains a varbinary(8000) column. When writing using spark it gives error that UNSUPORTED_DATATYPE as I am trying to limit length from default MAX to 8000 as MAX is not supported by Azure Synapse. Below is the error

.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_DATATYPE] Unsupported data type "VARBINARY(8000)"

ds.write()
            .format(FORMAT_JDBC)
            .option("url", JDBC_URL)
            .option("user", USER_VALUE)
            .option("password", PASSWORD_VALUE)
            .option(DRIVER_CLASS_NAME, "com.microsoft.sqlserver.jdbc.SQLServerDriver")
            .option(DB_TABLE, "\"ImageStore\"")
            .option("createTableColumnTypes","ImageName varchar(8000),ImageData VARBINARY(8000)")
            .mode(SaveMode.Overwrite)
            .save();

Note : I don't need append to an pre-existing table and need to create a new Table. Also I don't want to create a Table of Heap type in Synapse but the default CCI table.

I am writing output to a Azure Synapse table where the table contains a varbinary(8000) column. When writing using spark it gives error that UNSUPORTED_DATATYPE as I am trying to limit length from default MAX to 8000 as MAX is not supported by Azure Synapse. Below is the error

.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_DATATYPE] Unsupported data type "VARBINARY(8000)"

ds.write()
            .format(FORMAT_JDBC)
            .option("url", JDBC_URL)
            .option("user", USER_VALUE)
            .option("password", PASSWORD_VALUE)
            .option(DRIVER_CLASS_NAME, "com.microsoft.sqlserver.jdbc.SQLServerDriver")
            .option(DB_TABLE, "\"ImageStore\"")
            .option("createTableColumnTypes","ImageName varchar(8000),ImageData VARBINARY(8000)")
            .mode(SaveMode.Overwrite)
            .save();

Note : I don't need append to an pre-existing table and need to create a new Table. Also I don't want to create a Table of Heap type in Synapse but the default CCI table.

Share Improve this question edited Mar 27 at 12:27 Nitish Sharma asked Mar 27 at 12:26 Nitish SharmaNitish Sharma 111 bronze badge 6
  • learn.microsoft/en-us/sql/t-sql/data-types/… Can you check the above link if it helps you? – Dileep Raj Narayan Thumula Commented Mar 27 at 13:00
  • learn.microsoft/en-us/sql/t-sql/functions/… Can you check the above? – Dileep Raj Narayan Thumula Commented Mar 27 at 13:01
  • No these links do not help as we need to define schema using spark sql and not cast/convert data here. – Nitish Sharma Commented Mar 27 at 14:03
  • can you use BINARY(8000) Instead – Dileep Raj Narayan Thumula Commented Mar 27 at 14:05
  • What is a a Synapse table? Databricks does not have a varbinary data type, far as I know, just binary. – Andrew Commented Mar 27 at 15:12
 |  Show 1 more comment

1 Answer 1

Reset to default 0

Regarding the ERROR when using the spark it is because spark does not directly define table schema. However, you can create the table first using SQL and then write data into it using Spark.

ERROR: .apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_DATATYPE] Unsupported data type "VARBINARY(8000)

As you mentioned that you are using the Synapse table and you want to limit the Varbinary(8000)

I have tried the below in the dedicated sql pool

CREATE TABLE dbo.ImageStore2 (
    ImageName VARCHAR(8000),
    ImageData VARBINARY(8000)
)
WITH (DISTRIBUTION = ROUND_ROBIN, HEAP);

Results:

As you are trying to create the table from spark and you mentioned that you

Note : I don't need append to an pre-existing table and need to create a new Table.

Write to Azure Synapse Dedicated SQL Pool:
Ingest large volumes of data into both Internal and External tables.
Supports the following DataFrame save modes:

  • Append
  • ErrorIfExists
  • Ignore
  • Overwrite

Reference: Azure Synapse Dedicated SQL Pool Connector for Apache Spark

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论