python - How can I initialize a Zarr file that is larger than available memory?

My workflow generates a dataset of format xr.Dataset with dims (6, 36, 2, 13, 699, 1920) in float32.

I can process and write output array chunk by chunk, but only if the zarr file already exists, with:

ds.to_zarr('data.zarr', region=region)

Does anyone have an idea how to initialize a zarr file that is larger than available memory?

My libraries are:

zarr-python: '2.18.4'
xarray: '2025.1.2'

My workflow generates a dataset of format xr.Dataset with dims (6, 36, 2, 13, 699, 1920) in float32.

I can process and write output array chunk by chunk, but only if the zarr file already exists, with:

ds.to_zarr('data.zarr', region=region)

Does anyone have an idea how to initialize a zarr file that is larger than available memory?

My libraries are:

zarr-python: '2.18.4'
xarray: '2025.1.2'

Share Improve this question edited Mar 18 at 22:13 MrDeveloper 14713 bronze badges asked Mar 18 at 17:00 AMA 2061 gold badge4 silver badges18 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

I was able to do with `dask.array`.

import dask.array as da
import numpy as np

coords = ...
dims = ...
var_name = 'value'
chunks = (1, 13, 36, 128, 128)
encoding = {var_name: {'chunks': chunks}}
store = 'test.zarr'

daskarray = da.empty(
    (6, 13, 36, 699, 1920),
    chunks=chunks,
    dtype='float32',
)
daskarray[:] = np.nan

xr.DataArray(
    daskarray,
    coords=coords,
    dims=dims,
).to_dataset(name=var_name).to_zarr(store, mode='w', encoding=encoding)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - How can I initialize a Zarr file that is larger than available memory? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)