I am looking to employ terra
to update an analytical workflow (from Mokany et al 2022; Glob Ecol and Biogeo) originally written in R
with the raster
package. The workflow involves spatial analyses of predicted dissimilarities, established from a generalized dissimilarity model (GDM). The primary data are GDM transformed predictor rasters (outputs of GDM model fitting), assembled in a single multi-layer raster object. Spatial analyses involve using values from the transformed rasters to predict dissimilarities among pairs of grid cells and applying those outputs for answering various spatial questions (e.g., protected area representativeness). The project raster is large (requires 102 gigs to process; via mem_info
).
My specific question: what terra
-based code is suitable for applying GDM predictions to assess spatial patterns of similarity and uniqueness (i.e., mean similarity between each grid cell and a random sample of locations across the entire study region). For outputs, I am looking for 1) a raster of predicted similarity and 2) a raster of predicted uniqueness.
This question is quite comparable to Employing `terra::` to avoid std::bad_alloc error when extracting values from large SpatRaster stack. I have sought to use relevant components of that helpful answer here, but haven't successfully developed a workable solution.
I have created a reprex and show the initial code chunk I'm working with. That code draws heavily on the stackoverflow answer referenced above; I show where I'm stalled.
Create multilayer sample raster
library(terra)
r <- rast(ncol=100, nrow=100, xmin=-150, xmax=-80, ymin=20, ymax=60,
nlyr=5, vals=runif(10000*5))
r[[1:3]][10:20] <- NA
r[100:120] <- NA
Initial (partial) attempt at new workflow.
Note, here a sample of values from the full raster is employed because calculating pair-wise dis/similarities among all grid cells is not advisable.
# specify percentage of raster for sampling
n.sub <- ncell(r) * 0.5
# Return matrix of sampled values
x <- na.omit(spatSample(r, n.sub, "regular", as.df=FALSE))
# Compute dissimilarity between grid-cell pairs
sub.dissimilarity <- matrix(0, n.sub, n.sub)
# Flag - if there are na values in the sample, I need to revise `n.sub` in the previous line to match the number of rows after `na.omit`
# Use an arbitrary value to set up loop
gdmRastMod <- list(intercept = 2)
for(i in 1:(n.sub-1)) {
for(j in (i+1):n.sub) {
ecol.dist <- sum(abs(x[i, ] - x[j, ]))
sub.dissimilarity[j,i] <- 1 - exp(-1 * (gdmRastMod$intercept + ecol.dist))
}
}
# Establish distance matrix with `as.dist` from base R
sub.diss.m <- as.dist(sub.dissimilarity)
# I may use `vegan::vegdist()` here and explore other dis/similarity coefficients
I surmise a two functions are needed for calculating: 1) similarity and 2) uniqueness (similarity to the region as a whole) and applying those predictions across the study extent.
UPDATE: following @robert-hijmans helpful solution, I was able to produce a raster of predicted uniqueness across my study extent. The code scaled very well with my data environment. I get a Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'deepcopy': subscript out of bounds
when I run the first code chunk (re: similarity across 10 cells) with my project data. I get a multi-layer raster with sample data.
The uniqueness raster may be adequate for my purpose. I wanted to calculate differences in predicted dissimilarity across various landscape units (e.g., protected areas, ecoregions, climate regions, etc) and use them to summarize differences among the those units. I can foresee a tabular summary of that type of comparative assessment.
The other published (not terra
-based) examples of these comparisons, that I've seen, use plotted rasters to summarize such spatial assessments; they're typically applied to index protected area representativeness. Here, the similarity of each grid cell (in the study extent) to locations in the land unit(s) of interest is calculated, as well as the mean similarity of each cell to the whole region. In this respect, the inherent uniqueness of each grid cell is accounted for. The method provides a continuous (mapped and quantitative) estimate of representativeness in various land units (e.g., protected areas). (i.e., another different question). That's a different (albeit related) workflow.
I am looking to employ terra
to update an analytical workflow (from Mokany et al 2022; Glob Ecol and Biogeo) originally written in R
with the raster
package. The workflow involves spatial analyses of predicted dissimilarities, established from a generalized dissimilarity model (GDM). The primary data are GDM transformed predictor rasters (outputs of GDM model fitting), assembled in a single multi-layer raster object. Spatial analyses involve using values from the transformed rasters to predict dissimilarities among pairs of grid cells and applying those outputs for answering various spatial questions (e.g., protected area representativeness). The project raster is large (requires 102 gigs to process; via mem_info
).
My specific question: what terra
-based code is suitable for applying GDM predictions to assess spatial patterns of similarity and uniqueness (i.e., mean similarity between each grid cell and a random sample of locations across the entire study region). For outputs, I am looking for 1) a raster of predicted similarity and 2) a raster of predicted uniqueness.
This question is quite comparable to Employing `terra::` to avoid std::bad_alloc error when extracting values from large SpatRaster stack. I have sought to use relevant components of that helpful answer here, but haven't successfully developed a workable solution.
I have created a reprex and show the initial code chunk I'm working with. That code draws heavily on the stackoverflow answer referenced above; I show where I'm stalled.
Create multilayer sample raster
library(terra)
r <- rast(ncol=100, nrow=100, xmin=-150, xmax=-80, ymin=20, ymax=60,
nlyr=5, vals=runif(10000*5))
r[[1:3]][10:20] <- NA
r[100:120] <- NA
Initial (partial) attempt at new workflow.
Note, here a sample of values from the full raster is employed because calculating pair-wise dis/similarities among all grid cells is not advisable.
# specify percentage of raster for sampling
n.sub <- ncell(r) * 0.5
# Return matrix of sampled values
x <- na.omit(spatSample(r, n.sub, "regular", as.df=FALSE))
# Compute dissimilarity between grid-cell pairs
sub.dissimilarity <- matrix(0, n.sub, n.sub)
# Flag - if there are na values in the sample, I need to revise `n.sub` in the previous line to match the number of rows after `na.omit`
# Use an arbitrary value to set up loop
gdmRastMod <- list(intercept = 2)
for(i in 1:(n.sub-1)) {
for(j in (i+1):n.sub) {
ecol.dist <- sum(abs(x[i, ] - x[j, ]))
sub.dissimilarity[j,i] <- 1 - exp(-1 * (gdmRastMod$intercept + ecol.dist))
}
}
# Establish distance matrix with `as.dist` from base R
sub.diss.m <- as.dist(sub.dissimilarity)
# I may use `vegan::vegdist()` here and explore other dis/similarity coefficients
I surmise a two functions are needed for calculating: 1) similarity and 2) uniqueness (similarity to the region as a whole) and applying those predictions across the study extent.
UPDATE: following @robert-hijmans helpful solution, I was able to produce a raster of predicted uniqueness across my study extent. The code scaled very well with my data environment. I get a Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'deepcopy': subscript out of bounds
when I run the first code chunk (re: similarity across 10 cells) with my project data. I get a multi-layer raster with sample data.
The uniqueness raster may be adequate for my purpose. I wanted to calculate differences in predicted dissimilarity across various landscape units (e.g., protected areas, ecoregions, climate regions, etc) and use them to summarize differences among the those units. I can foresee a tabular summary of that type of comparative assessment.
The other published (not terra
-based) examples of these comparisons, that I've seen, use plotted rasters to summarize such spatial assessments; they're typically applied to index protected area representativeness. Here, the similarity of each grid cell (in the study extent) to locations in the land unit(s) of interest is calculated, as well as the mean similarity of each cell to the whole region. In this respect, the inherent uniqueness of each grid cell is accounted for. The method provides a continuous (mapped and quantitative) estimate of representativeness in various land units (e.g., protected areas). (i.e., another different question). That's a different (albeit related) workflow.
1 Answer
Reset to default 0You ask for "pairwise-similarities". If you mean pairs of cells, those are in 1-sub.diss.m
, for the sample. Do you actually want a raster for that (with ncell(r) layers)?
For the first 10 cells:
dfun <- function(cell) {
if (is.na(r[cell][1])) return(NULL)
edist <- sum(r - unlist(r[cell]))
out <- 1 - exp(-1 * (gdmRastMod$intercept + edist))
# with large datasets
# out <- writeRaster(out, paste0(tempfile(), ".tif"))
out
}
d <- lapply(1:10, dfun)
names(d) <- 1:10
dr <- rast(d[!sapply(d, is.null)])
dr
#class : SpatRaster
#dimensions : 100, 100, 9 (nrow, ncol, nlyr)
#resolution : 0.7, 0.4 (x, y)
#extent : -150, -80, 20, 60 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
#source(s) : memory
#names : 1, 2, 3, 4, 5, 6, ...
#min values : -0.03850853, 0.1265724, -0.8860728, 0.4260013, -0.5692148, -0.5711362, ...
#max values : 0.98196854, 0.9848348, 0.9672524, 0.9900337, 0.9727540, 0.9727206, ...
You also ask for "similarity to the region as a whole", but you do not define what you mean by that (what measure to use). But you could do something like this:
m <- global(r, "mean", na.rm=T)
edist <- sum(r - unlist(m))
dissim <- 1 - exp(-1 * (gdmRastMod$intercept + edist))
dissim
#class : SpatRaster
#dimensions : 100, 100, 1 (nrow, ncol, nlyr)
#resolution : 0.7, 0.4 (x, y)
#extent : -150, -80, 20, 60 (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
#source(s) : memory
#name : sum
#min value : 0.02861172
#max value : 0.98313393
rasterize()
was carried over from the old code; I checked for aterra
alternative and saw that the same function name was used. My question now includes the front-end of the code you provided earlier; I've added comments to show my level of understanding. Still learning as I go. Thank you kindly – Sean Basquill Commented Feb 17 at 20:38