python - Writing map types with pyiceberg

I'm not sure if this is a bug or I'm just not structuring the data correctly, I couldn't find any examples for writing maps.

Given a table with a simple schema with a map field

from pyiceberg.schema import Schema
from pyiceberg.types import StringType, MapType, NestedField

map_type = MapType(key_id=1001, key_type=StringType(), value_id=1002, value_type=StringType())
schema = Schema(NestedField(field_id=1, name='my_map', field_type=map_type))
table = catalog.create_table(..., schema=schema)

table
map.test(
 1: my_map: optional map<string, string>
),
partition by: [],
sort order: [],
snapshot: null

I first construct an arrow table with the converted schema

data = {'my_map': [{'symbol': 'BTC'}]}
pa_table = pa.Table.from_pydict(data, schema=schema.as_arrow())

pyarrow.Table
my_map: map<large_string, large_string>
 child 0, entries: struct<key: large_string not null, value: large_string not null> not >null
     child 0, key: large_string not null
     child 1, value: large_string not null
----
my_map: [[keys:["symbol"]values:["BTC"]]]

When writing though, the schema validation complains that I haven't provided key and value fields

>>> table.append(pa_table)

┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Table field                             ┃ Dataframe field                         ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ ✅ │ 1: my_map: optional map<string, string> │ 1: my_map: optional map<string, string> │
│ ❌ │ 2: key: required string                 │ Missing                                 │
│ ❌ │ 3: value: required string               │ Missing                                 │
└────┴─────────────────────────────────────────┴─────────────────────────────────────────┘

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Writing map types with pyiceberg - Stack Overflow

与本文相关的文章

评论列表(0)