I'm not sure if this is a bug or I'm just not structuring the data correctly, I couldn't find any examples for writing maps.
Given a table with a simple schema with a map field
from pyiceberg.schema import Schema
from pyiceberg.types import StringType, MapType, NestedField
map_type = MapType(key_id=1001, key_type=StringType(), value_id=1002, value_type=StringType())
schema = Schema(NestedField(field_id=1, name='my_map', field_type=map_type))
table = catalog.create_table(..., schema=schema)
table map.test( 1: my_map: optional map<string, string> ), partition by: [], sort order: [], snapshot: null
I first construct an arrow table with the converted schema
data = {'my_map': [{'symbol': 'BTC'}]}
pa_table = pa.Table.from_pydict(data, schema=schema.as_arrow())
pyarrow.Table my_map: map<large_string, large_string> child 0, entries: struct<key: large_string not null, value: large_string not null> not >null child 0, key: large_string not null child 1, value: large_string not null ---- my_map: [[keys:["symbol"]values:["BTC"]]]
When writing though, the schema validation complains that I haven't provided key
and value
fields
>>> table.append(pa_table)
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ┃ Table field ┃ Dataframe field ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ ✅ │ 1: my_map: optional map<string, string> │ 1: my_map: optional map<string, string> │
│ ❌ │ 2: key: required string │ Missing │
│ ❌ │ 3: value: required string │ Missing │
└────┴─────────────────────────────────────────┴─────────────────────────────────────────┘