I am using the pdftools package to read in a pdf file in R.
I've used the following code which works for the 1 pdf file I have (although it is designed to read in multiple files from my wd if needed):
files <- list.files(pattern = "pdf$")
mypdf <- lapply(files, pdf_text)
corp <- Corpus(URISource(files),
readerControl = list(reader = readPDF))
How can I select specific pages from my 1 pdf (or a specific pdf) within my files/wd?
e.g. I only want to read in pages 10:16
I tried applying brackets [10:16]
after pdf_text()
, but this didn't work
I am using the pdftools package to read in a pdf file in R.
I've used the following code which works for the 1 pdf file I have (although it is designed to read in multiple files from my wd if needed):
files <- list.files(pattern = "pdf$")
mypdf <- lapply(files, pdf_text)
corp <- Corpus(URISource(files),
readerControl = list(reader = readPDF))
How can I select specific pages from my 1 pdf (or a specific pdf) within my files/wd?
e.g. I only want to read in pages 10:16
I tried applying brackets [10:16]
after pdf_text()
, but this didn't work
1 Answer
Reset to default 1You are saving your PDFs to a list, so that's why the brackets aren't working as expected. mypdf[10:16]
would work if your object was a single PDF, but since you have multiple PDFs in a list, you need to use lapply
:
lapply(mypdf, \(x) x[10:16])