Here's the code I'm using:
library(readxl)
setwd(file.path(dirname("~"), "/Shared Documents/Programs/Data and Reporting/Data Quality Reports/Org Level Data"))
lst = list.files(pattern="*.xlsx")
df = data.frame()
for(table in lst){
dataFromExcel <- read_excel(table, sheet = "Project Summary")
df <- rbind(df,dataFromExcel)
}
write.csv(df, "_Project Level data.csv")
I basically know nothing about R, and simply mashed together code from a couple sites, editing what little I understood.
Here's the scenario: I have a bunch of Excel files that I download and put into a folder called "Org Level Data". I run this script and it creates a new file with all the data in each file's "Project Summary" sheet. However, it errors out if one of those files does not contain a sheet called "Project Summary", which will be quite a few files. I can get around this by removing those files from the folders, but I'd really like this script to just skip those files and ignore them, if possible.
I saw something about read_excel_safely but I cannot figure out how to insert that into my code, since I understand very little about the "read_excel" and "rbind" sections.
I tried adding , na = "---")
to the read_excel() function, thinking that it would handle that not existing, but it'd didn't add or take away anything from the script running.
I've also looked into trying to make lst
check for that "Project Summary" sheet when being created, but I don't know how to do that.
Here's the code I'm using:
library(readxl)
setwd(file.path(dirname("~"), "/Shared Documents/Programs/Data and Reporting/Data Quality Reports/Org Level Data"))
lst = list.files(pattern="*.xlsx")
df = data.frame()
for(table in lst){
dataFromExcel <- read_excel(table, sheet = "Project Summary")
df <- rbind(df,dataFromExcel)
}
write.csv(df, "_Project Level data.csv")
I basically know nothing about R, and simply mashed together code from a couple sites, editing what little I understood.
Here's the scenario: I have a bunch of Excel files that I download and put into a folder called "Org Level Data". I run this script and it creates a new file with all the data in each file's "Project Summary" sheet. However, it errors out if one of those files does not contain a sheet called "Project Summary", which will be quite a few files. I can get around this by removing those files from the folders, but I'd really like this script to just skip those files and ignore them, if possible.
I saw something about read_excel_safely but I cannot figure out how to insert that into my code, since I understand very little about the "read_excel" and "rbind" sections.
I tried adding , na = "---")
to the read_excel() function, thinking that it would handle that not existing, but it'd didn't add or take away anything from the script running.
I've also looked into trying to make lst
check for that "Project Summary" sheet when being created, but I don't know how to do that.
1 Answer
Reset to default 1You can use an if
with readxl::excel_sheets
to check whether a file contains a sheet with the required name and if not skip that file in your for loop by passing on to the next
file, e.g. in the code below the second file is skipped.
Using some fake example data and excel files:
library(readxl)
path <- tempdir()
### Create example files
writexl::write_xlsx(
list(
"Project Summary" = mtcars
),
file.path(path, "mtcars.xlsx")
)
writexl::write_xlsx(
list(
"Sheet" = iris
),
file.path(path, "iris.xlsx")
)
###
setwd(path)
lst <- list.files(pattern = "*.xlsx")
df <- data.frame()
for (table in lst) {
if (!"Project Summary" %in% excel_sheets(table)) next
dataFromExcel <- read_excel(table, sheet = "Project Summary")
df <- rbind(df, dataFromExcel)
}
#write.csv(df, "_Project Level data.csv")
df
#> # A tibble: 32 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ 22 more rows