最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Selecting pdf pages for multiple PDFs saved in a list - Stack Overflow

programmeradmin0浏览0评论

I am using the pdftools package to read in a pdf file in R.

I've used the following code which works for the 1 pdf file I have (although it is designed to read in multiple files from my wd if needed):

    files <- list.files(pattern = "pdf$")
    mypdf <- lapply(files, pdf_text)
    corp <- Corpus(URISource(files),
               readerControl = list(reader = readPDF))

How can I select specific pages from my 1 pdf (or a specific pdf) within my files/wd? e.g. I only want to read in pages 10:16

I tried applying brackets [10:16] after pdf_text(), but this didn't work

I am using the pdftools package to read in a pdf file in R.

I've used the following code which works for the 1 pdf file I have (although it is designed to read in multiple files from my wd if needed):

    files <- list.files(pattern = "pdf$")
    mypdf <- lapply(files, pdf_text)
    corp <- Corpus(URISource(files),
               readerControl = list(reader = readPDF))

How can I select specific pages from my 1 pdf (or a specific pdf) within my files/wd? e.g. I only want to read in pages 10:16

I tried applying brackets [10:16] after pdf_text(), but this didn't work

Share Improve this question edited Mar 3 at 14:47 jpsmith 18k6 gold badges23 silver badges45 bronze badges asked Mar 3 at 13:51 user29870836user29870836 211 bronze badge
Add a comment  | 

1 Answer 1

Reset to default 1

You are saving your PDFs to a list, so that's why the brackets aren't working as expected. mypdf[10:16] would work if your object was a single PDF, but since you have multiple PDFs in a list, you need to use lapply:

lapply(mypdf, \(x) x[10:16])
发布评论

评论列表(0)

  1. 暂无评论