I'm using the lidR
package (version 4.1.2) to process ~2000 *.laz point cloud files from airborne laser scanning. One step includes transforming the coordinate system from Gauß-Krüger (EPSG 31468) to UTM 32 (EPSG 25832). To do this I want to use the de_adv_BETA2007 transformation grid instead of the default option to get best possible results. So I use st:transform()
of the sf
package (version 1.0-19) with the corresponding specific proj pipeline
argument. It does work fine and I can see a little (~0.5 m) shift, which confirms, that the transformation is working better. Also, comparing the results with ALS data from the bavarian surveying agency confirms a better geolocation.
So, naturally, after testing with a small subset of the data I want to transform all datasets. To save time, I wanted to use the lascatalog
processing engine implemented into lidR together with the future
package to parallelize the processing and save time. It turns out, that whenever I apply the parallelization by running plan(multisession, workers = 6)
, st_transform is not using the BETA2007 method but the default one, introducing the 0.5m shift. Whenever plan(multisession, workers = 6)
is not run, the processing is done sequentially and the results are correct.
Here is the full code I've used:
library(lidR)
library(sf)
library(mapview)
library(future)
#' check set up of sf package:
sf_proj_search_paths()
sf_extSoftVersion()
sf_proj_network(enable = T)
#' get transformation options available with grid:
options <- sf_proj_pipelines(source_crs = "EPSG:31468", target_crs = "EPSG:25832")
#' select correct pipeline:
pipeline_BETA2007 <- options[1,]$definition
#' set up parallel processing with 6 cores:
plan(multisession, workers = 6)
#' read Lascatalog:
ctg <- readALSLAScatalog("../Data/GK")
#' apply epsg code, as it is not specified in the laz headers:
st_crs(ctg) <- 31468
#' check LAScatalog vailidity:
ctg
las_check(ctg)
#' plot LAScatalog:
plot(ctg, mapview = TRUE)
#' define output location and structure of catalog:
opt_output_files(ctg) <- ".../Data/UTM_parallel/{ORIGINALFILENAME}_UTM"
opt_laz_compression(ctg) <- TRUE
opt_chunk_buffer(ctg) <- 0
opt_chunk_size(ctg) <- 0
#' function to reproject las data:
reproject_catalog = function(las)
{
las_trans = sf::st_transform(las, crs = 25832, pipeline = pipeline_BETA2007)
return(las_trans)
}
#' apply function to catalog:
reprojected_ctg = catalog_map(ctg, reproject_catalog)
Of course the only code that was altered when running it without parallelization was to simply comment out the plan(multisession, workers = 6) part and adjusting the output file location. The data (selection of 4 tiles) including inputs and outputs can be downloaded here:
Here is a little more information on my processing environment:
> R.version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
crt ucrt
system x86_64, mingw32
status
major 4
minor 4.1
year 2024
month 06
day 14
svn rev 86737
language R
version.string R version 4.4.1 (2024-06-14 ucrt)
nickname Race for Your Life
> sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
"3.12.2" "3.9.3" "9.4.1" "true" "true" "9.4.1"
I've tried different things but none were successful. At no time a warning or error was generated. I will just use bpapply
in combination with parallel
as an workaround to do the processing but it should be possible to do it via lidR
. If I'm overlooking something obvious, I'm sorry.
I'm using the lidR
package (version 4.1.2) to process ~2000 *.laz point cloud files from airborne laser scanning. One step includes transforming the coordinate system from Gauß-Krüger (EPSG 31468) to UTM 32 (EPSG 25832). To do this I want to use the de_adv_BETA2007 transformation grid instead of the default option to get best possible results. So I use st:transform()
of the sf
package (version 1.0-19) with the corresponding specific proj pipeline
argument. It does work fine and I can see a little (~0.5 m) shift, which confirms, that the transformation is working better. Also, comparing the results with ALS data from the bavarian surveying agency confirms a better geolocation.
So, naturally, after testing with a small subset of the data I want to transform all datasets. To save time, I wanted to use the lascatalog
processing engine implemented into lidR together with the future
package to parallelize the processing and save time. It turns out, that whenever I apply the parallelization by running plan(multisession, workers = 6)
, st_transform is not using the BETA2007 method but the default one, introducing the 0.5m shift. Whenever plan(multisession, workers = 6)
is not run, the processing is done sequentially and the results are correct.
Here is the full code I've used:
library(lidR)
library(sf)
library(mapview)
library(future)
#' check set up of sf package:
sf_proj_search_paths()
sf_extSoftVersion()
sf_proj_network(enable = T)
#' get transformation options available with grid:
options <- sf_proj_pipelines(source_crs = "EPSG:31468", target_crs = "EPSG:25832")
#' select correct pipeline:
pipeline_BETA2007 <- options[1,]$definition
#' set up parallel processing with 6 cores:
plan(multisession, workers = 6)
#' read Lascatalog:
ctg <- readALSLAScatalog("../Data/GK")
#' apply epsg code, as it is not specified in the laz headers:
st_crs(ctg) <- 31468
#' check LAScatalog vailidity:
ctg
las_check(ctg)
#' plot LAScatalog:
plot(ctg, mapview = TRUE)
#' define output location and structure of catalog:
opt_output_files(ctg) <- ".../Data/UTM_parallel/{ORIGINALFILENAME}_UTM"
opt_laz_compression(ctg) <- TRUE
opt_chunk_buffer(ctg) <- 0
opt_chunk_size(ctg) <- 0
#' function to reproject las data:
reproject_catalog = function(las)
{
las_trans = sf::st_transform(las, crs = 25832, pipeline = pipeline_BETA2007)
return(las_trans)
}
#' apply function to catalog:
reprojected_ctg = catalog_map(ctg, reproject_catalog)
Of course the only code that was altered when running it without parallelization was to simply comment out the plan(multisession, workers = 6) part and adjusting the output file location. The data (selection of 4 tiles) including inputs and outputs can be downloaded here: https://workupload.com/file/N9dHMZ9TcC7
Here is a little more information on my processing environment:
> R.version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
crt ucrt
system x86_64, mingw32
status
major 4
minor 4.1
year 2024
month 06
day 14
svn rev 86737
language R
version.string R version 4.4.1 (2024-06-14 ucrt)
nickname Race for Your Life
> sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
"3.12.2" "3.9.3" "9.4.1" "true" "true" "9.4.1"
I've tried different things but none were successful. At no time a warning or error was generated. I will just use bpapply
in combination with parallel
as an workaround to do the processing but it should be possible to do it via lidR
. If I'm overlooking something obvious, I'm sorry.
1 Answer
Reset to default 0sf
does not have st_transform
for LAS
objects from the lidR package. The
lidRversion inherits its own function
st_transformand because of some inheritance rules it must be call thought
sf::st_transform. The lidR version of
st_transformis simpler than the
sf` one. It does not have a pipeline argument, so your code does not work either in sequential or parallel.
For your code to work, I should update lidR
. You cannot simply convert to POINT
, It is a little more comlex than that. Below is a modified version of the lidR
code where I'm using the ellipsis ...
to propagate sf
options. With this version, your pipeline
argument should be passed to the actual sf::st_transform
version.
As you can see lidR uses MULTIPOINT
for efficiency and there is a lot of code to maintain header and quantization valid with respect to LAS specifications
st_transform.LAS <- function(x, crs, ...)
{
if (is.na(st_crs(x)))
stop("No transformation possible from NA reference system")
p <- list(...)
# Transform the point coordinates
sfpts <- sf::st_multipoint(as.matrix(sf::st_coordinates(x)))
sfpts <- sf::st_geometry(sfpts)
sfpts <- sf::st_set_crs(sfpts, st_crs(x))
sfpts <- sf::st_transform(sfpts, crs, ...)
geom <- sf::st_geometry(sfpts)
X <- geom[[1]][,1]
Y <- geom[[1]][,2]
Z <- geom[[1]][,3]
# Update the offsets in the header
offsetx <- floor(min(X))
offsety <- floor(min(Y))
offsetz <- floor(min(Z))
if (!is.null(p$xoffset)) offsetx <- p$xoffset
if (!is.null(p$yoffset)) offsety <- p$yoffset
if (!is.null(p$zoffset)) offsetz <- p$zoffset
x@header@PHB[["X offset"]] <- offsetx
x@header@PHB[["Y offset"]] <- offsety
x@header@PHB[["Z offset"]] <- offsetz
# Update the scale factors
scalex <- x[["X scale factor"]]
scaley <- x[["Y scale factor"]]
scalez <- x[["Z scale factor"]]
if (!is.null(p$scale)) scalex <- scaley <- scalez <- p$scale
x@header@PHB[["X scale factor"]] <- scalex
x@header@PHB[["Y scale factor"]] <- scaley
x@header@PHB[["Z scale factor"]] <- scalez
# Quantize the coordinates
quantize(X, scalex, offsetx)
quantize(Y, scaley, offsety)
quantize(Z, scalez, offsetz)
# Update the LAS object
x@data[["X"]] <- X
x@data[["Y"]] <- Y
x@data[["Z"]] <- Z
x <- las_update(x)
st_crs(x) <- crs
return(x)
}
st_transform
for LAS objects from thelidR
package. The lidR inherit its own functionst_transform
and because of some inheritance rules it must be call thoughtsf::st_transform
. ThelidR
version ofst_transform
is simpler than thesf
one. It does not have apipeline
argument, so your code does not work either in sequential or parallel. – JRR Commented Feb 6 at 2:52lidR
package? I thought it would work as I got no warnings or errors and could observe an actual geometric difference in the output when introducing thepipeline
argument. So I guess I must convert the data to asimple feature
of typePOINT
first, then do the transformation using thepipeline
andst_transform
and then convert it back to aLAS
object? – JayDXQ Commented Feb 6 at 10:18