最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

dataframe - Error in writing panda data frame to Delta Table using schema with non-nullable fields - Stack Overflow

programmeradmin8浏览0评论

I'm using Deltalake version -0.17.0. here are steps, we do-

  1. Read in the DeltaTable from existing S3 location. dt = DeltaTable("s3://mylocation/")
  2. Converted it to pyarrow table. arrow_table = dt.to_pyarrow_table()
  3. Filtered the arrow table and selected specific columns of interest
  4. Converted arrow table to pandas data frame. df = arrow_table.to_pandas()
  5. Writing panda dataframe back to existing new delta table. Table is empty at this point and has schema defined with non-nullable fields.
  6. write_deltalake("s3://test_sample_process/", df, mode="overwrite"). also tried it with schema_mode="overwrite"

Error we get is -

    raise ValueError(
ValueError: Schema of data does not match table schema

Data schema:
namespace: string
ki_record_name: string
wk_center: string
kt_config: string
kt_parameters: string
mi_updated_at: timestamp[us, tz=UTC]
mi_updated_by: string

Table Schema:

namespace: string
ki_record_name: string
wk_center: string not null
kt_config: string
kt_parameters: string
mi_updated_at: timestamp[us, tz=UTC] not null
  -- field metadata --
  comment: '"The time this record was updated"'
mi_updated_by: string not null
  -- field metadata --
  comment: '"The process that updated this record"'

Verified the data frame we are trying to write that it does NOT contains any null values. It has only 2 rows, so could easily do visual inspection. Also posted the same on delta table github, but did not receive any helpful suggestions. The delta table uses pyarrow engine by default in current version. The recommendation was to migrate off it. We could try that, but it should work in the current version that supports the pyarrow engine.The same code works, when drop the schema.At that point, Delta table creates schema with all nullable fields. I want to enforce/use non-nullable fields and not able to understand why this is failing.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论