最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How can I initialize a Zarr file that is larger than available memory? - Stack Overflow

programmeradmin2浏览0评论

My workflow generates a dataset of format xr.Dataset with dims (6, 36, 2, 13, 699, 1920) in float32.

I can process and write output array chunk by chunk, but only if the zarr file already exists, with:

ds.to_zarr('data.zarr', region=region)

Does anyone have an idea how to initialize a zarr file that is larger than available memory?

My libraries are:

zarr-python: '2.18.4'
xarray: '2025.1.2'

My workflow generates a dataset of format xr.Dataset with dims (6, 36, 2, 13, 699, 1920) in float32.

I can process and write output array chunk by chunk, but only if the zarr file already exists, with:

ds.to_zarr('data.zarr', region=region)

Does anyone have an idea how to initialize a zarr file that is larger than available memory?

My libraries are:

zarr-python: '2.18.4'
xarray: '2025.1.2'
Share Improve this question edited Mar 18 at 22:13 MrDeveloper 14713 bronze badges asked Mar 18 at 17:00 AMAAMA 2061 gold badge4 silver badges18 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

I was able to do with `dask.array`.

import dask.array as da
import numpy as np

coords = ...
dims = ...
var_name = 'value'
chunks = (1, 13, 36, 128, 128)
encoding = {var_name: {'chunks': chunks}}
store = 'test.zarr'

daskarray = da.empty(
    (6, 13, 36, 699, 1920),
    chunks=chunks,
    dtype='float32',
)
daskarray[:] = np.nan

xr.DataArray(
    daskarray,
    coords=coords,
    dims=dims,
).to_dataset(name=var_name).to_zarr(store, mode='w', encoding=encoding)
发布评论

评论列表(0)

  1. 暂无评论