最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - lidR lascatalog processing engine transformation outputs differ between sequential and parallel processing - Stack Overflow

programmeradmin1浏览0评论

I'm using the lidR package (version 4.1.2) to process ~2000 *.laz point cloud files from airborne laser scanning. One step includes transforming the coordinate system from Gauß-Krüger (EPSG 31468) to UTM 32 (EPSG 25832). To do this I want to use the de_adv_BETA2007 transformation grid instead of the default option to get best possible results. So I use st:transform() of the sf package (version 1.0-19) with the corresponding specific proj pipeline argument. It does work fine and I can see a little (~0.5 m) shift, which confirms, that the transformation is working better. Also, comparing the results with ALS data from the bavarian surveying agency confirms a better geolocation.

So, naturally, after testing with a small subset of the data I want to transform all datasets. To save time, I wanted to use the lascatalog processing engine implemented into lidR together with the future package to parallelize the processing and save time. It turns out, that whenever I apply the parallelization by running plan(multisession, workers = 6), st_transform is not using the BETA2007 method but the default one, introducing the 0.5m shift. Whenever plan(multisession, workers = 6) is not run, the processing is done sequentially and the results are correct.

Here is the full code I've used:

library(lidR)
library(sf)
library(mapview)
library(future)

#' check set up of sf package:
sf_proj_search_paths()
sf_extSoftVersion()
sf_proj_network(enable = T)

#' get transformation options available with grid:
options <- sf_proj_pipelines(source_crs = "EPSG:31468", target_crs = "EPSG:25832")

#' select correct pipeline:
pipeline_BETA2007 <- options[1,]$definition

#' set up parallel processing with 6 cores:
plan(multisession, workers = 6)

#' read Lascatalog:
ctg <- readALSLAScatalog("../Data/GK")

#' apply epsg code, as it is not specified in the laz headers:
st_crs(ctg) <- 31468

#' check LAScatalog vailidity:
ctg
las_check(ctg)

#' plot LAScatalog:
plot(ctg, mapview = TRUE)

#' define output location and structure of catalog:
opt_output_files(ctg) <- ".../Data/UTM_parallel/{ORIGINALFILENAME}_UTM"
opt_laz_compression(ctg) <- TRUE
opt_chunk_buffer(ctg) <- 0
opt_chunk_size(ctg) <- 0

#' function to reproject las data:
reproject_catalog = function(las)
{
  las_trans = sf::st_transform(las, crs = 25832, pipeline = pipeline_BETA2007)
  return(las_trans)
}

#' apply function to catalog:
reprojected_ctg = catalog_map(ctg, reproject_catalog)

Of course the only code that was altered when running it without parallelization was to simply comment out the plan(multisession, workers = 6) part and adjusting the output file location. The data (selection of 4 tiles) including inputs and outputs can be downloaded here:

Here is a little more information on my processing environment:

> R.version
               _                                
platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          4.1                              
year           2024                             
month          06                               
day            14                               
svn rev        86737                            
language       R                                
version.string R version 4.4.1 (2024-06-14 ucrt)
nickname       Race for Your Life

> sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.12.2"        "3.9.3"        "9.4.1"         "true"         "true"        "9.4.1" 
          

I've tried different things but none were successful. At no time a warning or error was generated. I will just use bpapply in combination with parallel as an workaround to do the processing but it should be possible to do it via lidR. If I'm overlooking something obvious, I'm sorry.

I'm using the lidR package (version 4.1.2) to process ~2000 *.laz point cloud files from airborne laser scanning. One step includes transforming the coordinate system from Gauß-Krüger (EPSG 31468) to UTM 32 (EPSG 25832). To do this I want to use the de_adv_BETA2007 transformation grid instead of the default option to get best possible results. So I use st:transform() of the sf package (version 1.0-19) with the corresponding specific proj pipeline argument. It does work fine and I can see a little (~0.5 m) shift, which confirms, that the transformation is working better. Also, comparing the results with ALS data from the bavarian surveying agency confirms a better geolocation.

So, naturally, after testing with a small subset of the data I want to transform all datasets. To save time, I wanted to use the lascatalog processing engine implemented into lidR together with the future package to parallelize the processing and save time. It turns out, that whenever I apply the parallelization by running plan(multisession, workers = 6), st_transform is not using the BETA2007 method but the default one, introducing the 0.5m shift. Whenever plan(multisession, workers = 6) is not run, the processing is done sequentially and the results are correct.

Here is the full code I've used:

library(lidR)
library(sf)
library(mapview)
library(future)

#' check set up of sf package:
sf_proj_search_paths()
sf_extSoftVersion()
sf_proj_network(enable = T)

#' get transformation options available with grid:
options <- sf_proj_pipelines(source_crs = "EPSG:31468", target_crs = "EPSG:25832")

#' select correct pipeline:
pipeline_BETA2007 <- options[1,]$definition

#' set up parallel processing with 6 cores:
plan(multisession, workers = 6)

#' read Lascatalog:
ctg <- readALSLAScatalog("../Data/GK")

#' apply epsg code, as it is not specified in the laz headers:
st_crs(ctg) <- 31468

#' check LAScatalog vailidity:
ctg
las_check(ctg)

#' plot LAScatalog:
plot(ctg, mapview = TRUE)

#' define output location and structure of catalog:
opt_output_files(ctg) <- ".../Data/UTM_parallel/{ORIGINALFILENAME}_UTM"
opt_laz_compression(ctg) <- TRUE
opt_chunk_buffer(ctg) <- 0
opt_chunk_size(ctg) <- 0

#' function to reproject las data:
reproject_catalog = function(las)
{
  las_trans = sf::st_transform(las, crs = 25832, pipeline = pipeline_BETA2007)
  return(las_trans)
}

#' apply function to catalog:
reprojected_ctg = catalog_map(ctg, reproject_catalog)

Of course the only code that was altered when running it without parallelization was to simply comment out the plan(multisession, workers = 6) part and adjusting the output file location. The data (selection of 4 tiles) including inputs and outputs can be downloaded here: https://workupload.com/file/N9dHMZ9TcC7

Here is a little more information on my processing environment:

> R.version
               _                                
platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          4.1                              
year           2024                             
month          06                               
day            14                               
svn rev        86737                            
language       R                                
version.string R version 4.4.1 (2024-06-14 ucrt)
nickname       Race for Your Life

> sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.12.2"        "3.9.3"        "9.4.1"         "true"         "true"        "9.4.1" 
          

I've tried different things but none were successful. At no time a warning or error was generated. I will just use bpapply in combination with parallel as an workaround to do the processing but it should be possible to do it via lidR. If I'm overlooking something obvious, I'm sorry.

Share Improve this question edited Feb 6 at 10:21 JayDXQ asked Feb 5 at 14:12 JayDXQJayDXQ 113 bronze badges 2
  • sf does not have st_transform for LAS objects from the lidR package. The lidR inherit its own function st_transform and because of some inheritance rules it must be call thought sf::st_transform. The lidR version of st_transform is simpler than the sf one. It does not have a pipeline argument, so your code does not work either in sequential or parallel. – JRR Commented Feb 6 at 2:52
  • Thank you for your answer. So there is no way to transform a LAS object using a specific pipeline directly using the lidR package? I thought it would work as I got no warnings or errors and could observe an actual geometric difference in the output when introducing the pipeline argument. So I guess I must convert the data to a simple feature of type POINT first, then do the transformation using the pipeline and st_transform and then convert it back to a LAS object? – JayDXQ Commented Feb 6 at 10:18
Add a comment  | 

1 Answer 1

Reset to default 0

sf does not have st_transform for LAS objects from the lidR package. The lidRversion inherits its own functionst_transformand because of some inheritance rules it must be call thoughtsf::st_transform. The lidR version of st_transformis simpler than thesf` one. It does not have a pipeline argument, so your code does not work either in sequential or parallel.

For your code to work, I should update lidR. You cannot simply convert to POINT, It is a little more comlex than that. Below is a modified version of the lidR code where I'm using the ellipsis ... to propagate sf options. With this version, your pipeline argument should be passed to the actual sf::st_transform version.

As you can see lidR uses MULTIPOINT for efficiency and there is a lot of code to maintain header and quantization valid with respect to LAS specifications

st_transform.LAS <- function(x, crs, ...)
{
  if (is.na(st_crs(x)))
    stop("No transformation possible from NA reference system")
  
  p <- list(...)
  
  # Transform the point coordinates
  sfpts <- sf::st_multipoint(as.matrix(sf::st_coordinates(x)))
  sfpts <- sf::st_geometry(sfpts)
  sfpts <- sf::st_set_crs(sfpts, st_crs(x))
  sfpts <- sf::st_transform(sfpts, crs, ...)
  geom  <- sf::st_geometry(sfpts)
  X <- geom[[1]][,1]
  Y <- geom[[1]][,2]
  Z <- geom[[1]][,3]
  
  # Update the offsets in the header
  offsetx <- floor(min(X))
  offsety <- floor(min(Y))
  offsetz <- floor(min(Z))
  if (!is.null(p$xoffset)) offsetx <- p$xoffset
  if (!is.null(p$yoffset)) offsety <- p$yoffset
  if (!is.null(p$zoffset)) offsetz <- p$zoffset
  x@header@PHB[["X offset"]] <- offsetx
  x@header@PHB[["Y offset"]] <- offsety
  x@header@PHB[["Z offset"]] <- offsetz
  
  # Update the scale factors
  scalex <- x[["X scale factor"]]
  scaley <- x[["Y scale factor"]]
  scalez <- x[["Z scale factor"]]
  if (!is.null(p$scale)) scalex <- scaley <- scalez <- p$scale
  x@header@PHB[["X scale factor"]] <- scalex
  x@header@PHB[["Y scale factor"]] <- scaley
  x@header@PHB[["Z scale factor"]] <- scalez
  
  # Quantize the coordinates
  quantize(X, scalex, offsetx)
  quantize(Y, scaley, offsety)
  quantize(Z, scalez, offsetz)
  
  # Update the LAS object
  x@data[["X"]] <- X
  x@data[["Y"]] <- Y
  x@data[["Z"]] <- Z
  
  x <- las_update(x)
  st_crs(x) <- crs
  
  return(x)
}

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论