My workflow generates a dataset of format xr.Dataset
with dims (6, 36, 2, 13, 699, 1920)
in float32
.
I can process and write output array chunk by chunk, but only if the zarr
file already exists, with:
ds.to_zarr('data.zarr', region=region)
Does anyone have an idea how to initialize a zarr
file that is larger than available memory?
My libraries are:
zarr-python: '2.18.4'
xarray: '2025.1.2'
My workflow generates a dataset of format xr.Dataset
with dims (6, 36, 2, 13, 699, 1920)
in float32
.
I can process and write output array chunk by chunk, but only if the zarr
file already exists, with:
ds.to_zarr('data.zarr', region=region)
Does anyone have an idea how to initialize a zarr
file that is larger than available memory?
My libraries are:
zarr-python: '2.18.4'
xarray: '2025.1.2'
Share
Improve this question
edited Mar 18 at 22:13
MrDeveloper
14713 bronze badges
asked Mar 18 at 17:00
AMAAMA
2061 gold badge4 silver badges18 bronze badges
1 Answer
Reset to default 0I was able to do with `dask.array`.
import dask.array as da
import numpy as np
coords = ...
dims = ...
var_name = 'value'
chunks = (1, 13, 36, 128, 128)
encoding = {var_name: {'chunks': chunks}}
store = 'test.zarr'
daskarray = da.empty(
(6, 13, 36, 699, 1920),
chunks=chunks,
dtype='float32',
)
daskarray[:] = np.nan
xr.DataArray(
daskarray,
coords=coords,
dims=dims,
).to_dataset(name=var_name).to_zarr(store, mode='w', encoding=encoding)