Using terra in R to calculate & map similarity and uniqueness across cells of a large GDM-based raster

I am looking to employ terra to update an analytical workflow (from Mokany et al 2022; Glob Ecol and Biogeo) originally written in R with the raster package. The workflow involves spatial analyses of predicted dissimilarities, established from a generalized dissimilarity model (GDM). The primary data are GDM transformed predictor rasters (outputs of GDM model fitting), assembled in a single multi-layer raster object. Spatial analyses involve using values from the transformed rasters to predict dissimilarities among pairs of grid cells and applying those outputs for answering various spatial questions (e.g., protected area representativeness). The project raster is large (requires 102 gigs to process; via mem_info).

My specific question: what terra-based code is suitable for applying GDM predictions to assess spatial patterns of similarity and uniqueness (i.e., mean similarity between each grid cell and a random sample of locations across the entire study region). For outputs, I am looking for 1) a raster of predicted similarity and 2) a raster of predicted uniqueness.

This question is quite comparable to Employing `terra::` to avoid std::bad_alloc error when extracting values from large SpatRaster stack. I have sought to use relevant components of that helpful answer here, but haven't successfully developed a workable solution.

I have created a reprex and show the initial code chunk I'm working with. That code draws heavily on the stackoverflow answer referenced above; I show where I'm stalled.

Create multilayer sample raster

library(terra)
r <- rast(ncol=100, nrow=100, xmin=-150, xmax=-80, ymin=20, ymax=60,
          nlyr=5, vals=runif(10000*5))
r[[1:3]][10:20] <- NA
r[100:120] <- NA

Initial (partial) attempt at new workflow.

Note, here a sample of values from the full raster is employed because calculating pair-wise dis/similarities among all grid cells is not advisable.

# specify percentage of raster for sampling
n.sub <- ncell(r) * 0.5

# Return matrix of sampled values  
x <- na.omit(spatSample(r, n.sub, "regular", as.df=FALSE)) 

# Compute dissimilarity between grid-cell pairs
sub.dissimilarity <- matrix(0, n.sub, n.sub)

# Flag - if there are na values in the sample, I need to revise `n.sub` in the previous line to match the number of rows after `na.omit`

# Use an arbitrary value to set up loop
gdmRastMod <- list(intercept = 2)

for(i in 1:(n.sub-1)) {
    for(j in (i+1):n.sub) {
        ecol.dist <- sum(abs(x[i, ] - x[j, ]))
        sub.dissimilarity[j,i] <- 1 - exp(-1 * (gdmRastMod$intercept + ecol.dist))
    }
} 

# Establish distance matrix with `as.dist` from base R
sub.diss.m <- as.dist(sub.dissimilarity)

# I may use `vegan::vegdist()` here and explore other dis/similarity coefficients

I surmise a two functions are needed for calculating: 1) similarity and 2) uniqueness (similarity to the region as a whole) and applying those predictions across the study extent.

UPDATE: following @robert-hijmans helpful solution, I was able to produce a raster of predicted uniqueness across my study extent. The code scaled very well with my data environment. I get a Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'deepcopy': subscript out of bounds when I run the first code chunk (re: similarity across 10 cells) with my project data. I get a multi-layer raster with sample data.

The uniqueness raster may be adequate for my purpose. I wanted to calculate differences in predicted dissimilarity across various landscape units (e.g., protected areas, ecoregions, climate regions, etc) and use them to summarize differences among the those units. I can foresee a tabular summary of that type of comparative assessment.

The other published (not terra-based) examples of these comparisons, that I've seen, use plotted rasters to summarize such spatial assessments; they're typically applied to index protected area representativeness. Here, the similarity of each grid cell (in the study extent) to locations in the land unit(s) of interest is calculated, as well as the mean similarity of each cell to the whole region. In this respect, the inherent uniqueness of each grid cell is accounted for. The method provides a continuous (mapped and quantitative) estimate of representativeness in various land units (e.g., protected areas). (i.e., another different question). That's a different (albeit related) workflow.

I have created a reprex and show the initial code chunk I'm working with. That code draws heavily on the stackoverflow answer referenced above; I show where I'm stalled.

Create multilayer sample raster

library(terra)
r <- rast(ncol=100, nrow=100, xmin=-150, xmax=-80, ymin=20, ymax=60,
          nlyr=5, vals=runif(10000*5))
r[[1:3]][10:20] <- NA
r[100:120] <- NA

Initial (partial) attempt at new workflow.

Note, here a sample of values from the full raster is employed because calculating pair-wise dis/similarities among all grid cells is not advisable.

# specify percentage of raster for sampling
n.sub <- ncell(r) * 0.5

# Return matrix of sampled values  
x <- na.omit(spatSample(r, n.sub, "regular", as.df=FALSE)) 

# Compute dissimilarity between grid-cell pairs
sub.dissimilarity <- matrix(0, n.sub, n.sub)

# Flag - if there are na values in the sample, I need to revise `n.sub` in the previous line to match the number of rows after `na.omit`

# Use an arbitrary value to set up loop
gdmRastMod <- list(intercept = 2)

for(i in 1:(n.sub-1)) {
    for(j in (i+1):n.sub) {
        ecol.dist <- sum(abs(x[i, ] - x[j, ]))
        sub.dissimilarity[j,i] <- 1 - exp(-1 * (gdmRastMod$intercept + ecol.dist))
    }
} 

# Establish distance matrix with `as.dist` from base R
sub.diss.m <- as.dist(sub.dissimilarity)

# I may use `vegan::vegdist()` here and explore other dis/similarity coefficients

I surmise a two functions are needed for calculating: 1) similarity and 2) uniqueness (similarity to the region as a whole) and applying those predictions across the study extent.

Share Improve this question edited yesterday asked Feb 17 at 0:29 Sean Basquill 12 bronze badges

Please edit your code so that you provide a single minimal example that reproduces your issue. I already pointed out that you are using a poorly implemented algorithm that does not scale to a large data set (stackoverflow/questions/79343223/…). I gave you a very elaborate answer showing how you can reimplement it; but you seem to ignore that here. – Robert Hijmans Commented Feb 17 at 3:21
This is only my 3rd question on stackoverflow, so I'm not accustomed to all conventions. I will look to incorporate the front end of the solution you provided previously. That answer was very helpful and I ran numerous clustering trials, culminating in a successful outcome, over the past couple weeks. Thank you – Sean Basquill Commented Feb 17 at 13:50
"rasterize" does not work in your code --- but it is an odd thing to do anyhow. You are still using the loop over Trans.env.table. Why? That is a bad idea and I have shown how to avoid that in your previous question. – Robert Hijmans Commented Feb 17 at 18:28
rasterize() was carried over from the old code; I checked for a terra alternative and saw that the same function name was used. My question now includes the front-end of the code you provided earlier; I've added comments to show my level of understanding. Still learning as I go. Thank you kindly – Sean Basquill Commented Feb 17 at 20:38
It is currently not clear what your question is, can you clarify? And can you remove the "legacy code" as it is not clear why that is relevant anymore. It is fundamentally flawed, the problem is not about replacing raster functions with terra functions. That is why I redesigned it for you in the previous question. – Robert Hijmans Commented Feb 18 at 1:30

| Show 2 more comments

1 Answer 1

Sorted by: Reset to default 0

You ask for "pairwise-similarities". If you mean pairs of cells, those are in 1-sub.diss.m, for the sample. Do you actually want a raster for that (with ncell(r) layers)?

For the first 10 cells:

dfun <- function(cell) {
  if (is.na(r[cell][1])) return(NULL)
  edist <- sum(r - unlist(r[cell]))
  out <- 1 - exp(-1 * (gdmRastMod$intercept + edist))
  # with large datasets 
  # out <- writeRaster(out, paste0(tempfile(), ".tif"))
  out
}
d <- lapply(1:10, dfun)
names(d) <- 1:10

dr <- rast(d[!sapply(d, is.null)])
dr
#class       : SpatRaster 
#dimensions  : 100, 100, 9  (nrow, ncol, nlyr)
#resolution  : 0.7, 0.4  (x, y)
#extent      : -150, -80, 20, 60  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
#source(s)   : memory
#names       :           1,         2,          3,         4,          5,          6, ... 
#min values  : -0.03850853, 0.1265724, -0.8860728, 0.4260013, -0.5692148, -0.5711362, ... 
#max values  :  0.98196854, 0.9848348,  0.9672524, 0.9900337,  0.9727540,  0.9727206, ...

You also ask for "similarity to the region as a whole", but you do not define what you mean by that (what measure to use). But you could do something like this:

m <- global(r, "mean", na.rm=T)
edist <- sum(r - unlist(m))
dissim <- 1 - exp(-1 * (gdmRastMod$intercept + edist))

dissim
#class       : SpatRaster 
#dimensions  : 100, 100, 1  (nrow, ncol, nlyr)
#resolution  : 0.7, 0.4  (x, y)
#extent      : -150, -80, 20, 60  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
#source(s)   : memory
#name        :        sum 
#min value   : 0.02861172 
#max value   : 0.98313393

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Using terra in R to calculate & map similarity and uniqueness across cells of a large GDM-based raster - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)