r - How can I skip Excel Files if they do not contain a specific Sheet?

Here's the code I'm using:

library(readxl) 
setwd(file.path(dirname("~"), "/Shared Documents/Programs/Data and Reporting/Data Quality Reports/Org Level Data"))
lst = list.files(pattern="*.xlsx")
df = data.frame()
for(table in lst){
  dataFromExcel <- read_excel(table, sheet = "Project Summary")
  df <- rbind(df,dataFromExcel)
}
write.csv(df, "_Project Level data.csv")

I basically know nothing about R, and simply mashed together code from a couple sites, editing what little I understood.

Here's the scenario: I have a bunch of Excel files that I download and put into a folder called "Org Level Data". I run this script and it creates a new file with all the data in each file's "Project Summary" sheet. However, it errors out if one of those files does not contain a sheet called "Project Summary", which will be quite a few files. I can get around this by removing those files from the folders, but I'd really like this script to just skip those files and ignore them, if possible.

I saw something about read_excel_safely but I cannot figure out how to insert that into my code, since I understand very little about the "read_excel" and "rbind" sections.

I tried adding , na = "---") to the read_excel() function, thinking that it would handle that not existing, but it'd didn't add or take away anything from the script running.

I've also looked into trying to make lst check for that "Project Summary" sheet when being created, but I don't know how to do that.

Here's the code I'm using:

library(readxl) 
setwd(file.path(dirname("~"), "/Shared Documents/Programs/Data and Reporting/Data Quality Reports/Org Level Data"))
lst = list.files(pattern="*.xlsx")
df = data.frame()
for(table in lst){
  dataFromExcel <- read_excel(table, sheet = "Project Summary")
  df <- rbind(df,dataFromExcel)
}
write.csv(df, "_Project Level data.csv")

I basically know nothing about R, and simply mashed together code from a couple sites, editing what little I understood.

I saw something about read_excel_safely but I cannot figure out how to insert that into my code, since I understand very little about the "read_excel" and "rbind" sections.

I tried adding , na = "---") to the read_excel() function, thinking that it would handle that not existing, but it'd didn't add or take away anything from the script running.

I've also looked into trying to make lst check for that "Project Summary" sheet when being created, but I don't know how to do that.

Share Improve this question edited Feb 6 at 17:35 margusl 17.4k3 gold badges19 silver badges28 bronze badges asked Feb 6 at 17:26 AdorableRaspberry953 11 silver badge1 bronze badge New contributor AdorableRaspberry953 is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

You can use an if with readxl::excel_sheets to check whether a file contains a sheet with the required name and if not skip that file in your for loop by passing on to the next file, e.g. in the code below the second file is skipped.

Using some fake example data and excel files:

library(readxl)

path <- tempdir()

### Create example files
writexl::write_xlsx(
  list(
    "Project Summary" = mtcars
  ),
  file.path(path, "mtcars.xlsx")
)

writexl::write_xlsx(
  list(
    "Sheet" = iris
  ),
  file.path(path, "iris.xlsx")
)
###

setwd(path)

lst <- list.files(pattern = "*.xlsx")
df <- data.frame()
for (table in lst) {
  if (!"Project Summary" %in% excel_sheets(table)) next
  dataFromExcel <- read_excel(table, sheet = "Project Summary")
  df <- rbind(df, dataFromExcel)
}
#write.csv(df, "_Project Level data.csv")

df
#> # A tibble: 32 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

r - How can I skip Excel Files if they do not contain a specific Sheet? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)