I'm having trouble figuring out how to access key metadata when using terra, though I can see it's available in the original netcdf file.
I have a dataset downloaded from the Climate Data Store that includes monthly data for a series of different ensemble members. These are identified by member number, and I need to be able to associate each number with the appropriate model. (I apologize for the lack of reproducible example - I don't see a way to recreate this problem without importing netcdf data; however, I do think the problem of identifying/accessing non-standard attributes is a general one.)
In ncdf4, I am able to get a list of the attributes, and identify that one of them has the model names I need:
library(ncdf4)
nc <- nc_open('filepath.nc')
attributes(nc$var)
$names
[1] "lat_bnds" "lon_bnds" "time_bnds" "member_id"
[5] "gcm_institution" "gcm_model" "gcm_variant" "t"
[9] "crs" "height2m"
head(ncvar_get(nc, "gcm_model"))
[1] "ACCESS-CM2" "ACCESS-ESM1-5" "AWI-CM-1-1-MR" "BCC-CSM2-MR" "CAMS-CSM1-0"
[6] "CESM2-WACCM"
However, I can't figure out how to access this list with terra, despite searching through the documentation. meta()
does provide a long list that includes the attribute I need, and meta(., layers=TRUE)
suggests that this is associated with an attribute called 'coordinates', but I don't know how to more directly call this to get the list of model names (or, better, associate those with the layer names). Is this possible in terra, or do I need to rely on ncdf4 (or maybe the newer ncdfCF ) for identifying the deeper attribute information?
rterra <- rast('filepath.nc')
head(names(rterra))
[1] "t_member=1_1" "t_member=1_2" "t_member=1_3" "t_member=1_4" "t_member=1_5"
[6] "t_member=1_6"
meta(rterra)
...
[48,] "member_id height2m time lat lon gcm_institution gcm_model gcm_variant"
...
meta(rterra, layers=TRUE)
"coordinates" "member_id height2m time lat lon gcm_institution gcm_model gcm_variant"
EDIT: Data is available here: This is the original file, which is quite large (4GB). Here is a much smaller subset of just the first two layers, but this does not seem to retain all the original metadata relevant to this question.
I'm having trouble figuring out how to access key metadata when using terra, though I can see it's available in the original netcdf file.
I have a dataset downloaded from the Climate Data Store that includes monthly data for a series of different ensemble members. These are identified by member number, and I need to be able to associate each number with the appropriate model. (I apologize for the lack of reproducible example - I don't see a way to recreate this problem without importing netcdf data; however, I do think the problem of identifying/accessing non-standard attributes is a general one.)
In ncdf4, I am able to get a list of the attributes, and identify that one of them has the model names I need:
library(ncdf4)
nc <- nc_open('filepath.nc')
attributes(nc$var)
$names
[1] "lat_bnds" "lon_bnds" "time_bnds" "member_id"
[5] "gcm_institution" "gcm_model" "gcm_variant" "t"
[9] "crs" "height2m"
head(ncvar_get(nc, "gcm_model"))
[1] "ACCESS-CM2" "ACCESS-ESM1-5" "AWI-CM-1-1-MR" "BCC-CSM2-MR" "CAMS-CSM1-0"
[6] "CESM2-WACCM"
However, I can't figure out how to access this list with terra, despite searching through the documentation. meta()
does provide a long list that includes the attribute I need, and meta(., layers=TRUE)
suggests that this is associated with an attribute called 'coordinates', but I don't know how to more directly call this to get the list of model names (or, better, associate those with the layer names). Is this possible in terra, or do I need to rely on ncdf4 (or maybe the newer ncdfCF ) for identifying the deeper attribute information?
rterra <- rast('filepath.nc')
head(names(rterra))
[1] "t_member=1_1" "t_member=1_2" "t_member=1_3" "t_member=1_4" "t_member=1_5"
[6] "t_member=1_6"
meta(rterra)
...
[48,] "member_id height2m time lat lon gcm_institution gcm_model gcm_variant"
...
meta(rterra, layers=TRUE)
"coordinates" "member_id height2m time lat lon gcm_institution gcm_model gcm_variant"
EDIT: Data is available here: This is the original file, which is quite large (4GB). Here is a much smaller subset of just the first two layers, but this does not seem to retain all the original metadata relevant to this question.
Share Improve this question edited Mar 27 at 20:07 Jaken asked Mar 27 at 2:48 JakenJaken 5552 silver badges10 bronze badges 4- Can you provide an example file? – Robert Hijmans Commented Mar 27 at 2:57
- @RobertHijmans - I can upload a sample file; however, the original file is too large. When I try to take a subset and save as a netcdf (using either subset(rterra, 1:2) or rterra[[1:2] with writeCDF), this appears to change/rewrite the attributes, which are no longer accessible in the new file through either terra::meta() or ncdf4::attributes(). Is there a better way to pull a sample that preserves the relevant info? – Jaken Commented Mar 27 at 4:16
- Can you provide a link to the file? E.g, Gdrive, Box – Robert Hijmans Commented Mar 27 at 14:55
- @RobertHijmans - I edited to add a link to the full file (4GB) and a smaller sample that may not have retained the full metadata in the saving process. – Jaken Commented Mar 27 at 20:08
1 Answer
Reset to default 2The coordinates
attributes tells you that the data variable "t" (in CF-speak, this is what terra
turns into a SpatRaster
) is 8-dimensional. Before you try to visualize that in your mind, there are only 4 "true" axes (in the sense that the actual array of the data variable is 4D) "lon", "lat", "time" and "member", with the 5th axis "height2m" being a scalar coordinate variable (more CF-speak). The 3 remaining "dimensions" are in fact labels attached to the member
axis. These labels are netCDF variables like all the other "dimensions" but GDAL apparently does not read them and terra
can thus not see them.
Otherwise you are out-of-luck because reading the netCDF file (the one I have over here with 34 members) with terra
produces a SpatRaster
as a 3D array, merging the member
and time
axes into one. Why that is I do not know, because terra
also supports 4D data sets as a SpatRasterDataset
. This could be due to way that GDAL reads the netCDF file.
With the development version of ncdfCF
you can access the data by "gcm_model". The example below extracts a year of data but you could just as easily extract all data for a single GCM model or for some geographic subset, also in combination.
# install.packages("devtools")
devtools::install_github("pvanlaake/ncdfCF")
# Open the netCDF file, see what is inside
fn <- "~/path_to/t_CMIP6_ssp245_mon_201501-210012.nc"
ds <- open_ncdf(fn)
names(ds)
#> [1] "t"
# Grab variable "t" and, as an example, extract the data for this year
t <- ds[["t"]]
tdata <- t$subset(list(time = c("2025-01-01", "2026-01-01")))
# Which axes have labels
tdata$axis_labels
#> [1] "member"
# Grab that axis and list names of the sets of labels
m <- tdata$axes[["member"]]
m$label_set_names
#> [1] "member_id" "gcm_institution" "gcm_model" "gcm_variant"
# Make the desired label set active
m$active_label_set <- "gcm_model"
# Now you can create a SpatRasterDataset, which will have proper names set...
library(terra)
#> terra 1.8.29
(r <- tdata$terra())
#> class : SpatRasterDataset
#> subdatasets : 34
#> dimensions : 180, 360 (nrow, ncol)
#> nlyr : 12, 12, 12, 12, 12, 12, 12, 12, 12
#> resolution : 1, 1 (x, y)
#> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +ellps=WGS84 +no_defs
#> source(s) : memory
#> names : ACCESS-CM2, ACCESS-ESM1-5, AWI-CM-1-1-MR, BCC-CSM2-MR, CAMS-CSM1-0, CESM2-WACCM, ...
# ... as wekll as on the SpatRasters it contains
r["EC-Earth3-Veg"]
#> class : SpatRaster
#> dimensions : 180, 360, 12 (nrow, ncol, nlyr)
#> resolution : 1, 1 (x, y)
#> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +ellps=WGS84 +no_defs
#> source(s) : memory
#> names : 2025-01-01, 2025-02-01, 2025-03-01, 2025-04-01, 2025-05-01, 2025-06-01, ...
#> min values : -43.19824, -40.50879, -47.00879, -51.64355, -58.46680, -52.64648, ...
#> max values : 38.09766, 37.27539, 35.53711, 35.34082, 37.71777, 39.25391, ...
# So you can refer to the names in plotting and other processing
plot(r["EC-Earth3-Veg"]["2025-05-01"])