最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Split and Copy specific Pages from a PDF file - Stack Overflow

programmeradmin2浏览0评论

I am currently working on a program that can split a PDF-file with numerous sections that have the same layout, but of course different content.

I need all the different sections in a single pdf-file. So I have specific words or structures in the sections that repeat themselves before the start of a new one and that is where I want to split them.

I am using pdf-lib but it seems the library is not able to modify the original text and cannot copy pdf-data until specific data is read. I managed to get the basic text out with pdf-parse together but I actually want the original layout of the PDF to persist.

Is there any way to copy specific parts of pages in a pdf in their original layout to a new pdf-file? The sections can also vary in length from only half a page to multiple pages.

And if that is not possible, then is it possible to extract all the information about the layout, design and formats of the original pdf to design the copied ones the same as the original one?

For example the same texts need to be bold, underlined, cursive and I also need the bullet points. So all the information should be identical.

I am currently trying to do this in Javascript in Visual Studio Code.

Thanks in advance!

I am currently working on a program that can split a PDF-file with numerous sections that have the same layout, but of course different content.

I need all the different sections in a single pdf-file. So I have specific words or structures in the sections that repeat themselves before the start of a new one and that is where I want to split them.

I am using pdf-lib but it seems the library is not able to modify the original text and cannot copy pdf-data until specific data is read. I managed to get the basic text out with pdf-parse together but I actually want the original layout of the PDF to persist.

Is there any way to copy specific parts of pages in a pdf in their original layout to a new pdf-file? The sections can also vary in length from only half a page to multiple pages.

And if that is not possible, then is it possible to extract all the information about the layout, design and formats of the original pdf to design the copied ones the same as the original one?

For example the same texts need to be bold, underlined, cursive and I also need the bullet points. So all the information should be identical.

I am currently trying to do this in Javascript in Visual Studio Code.

Thanks in advance!

Share Improve this question edited Feb 7 at 13:20 K J 11.8k4 gold badges23 silver badges62 bronze badges asked Feb 7 at 8:49 Adri2210Adri2210 13 bronze badges 4
  • Is this achievable by someone with not a lot of cmd knowledge on a company laptop without special privileges? Or would it be faster to extract the whole text as basic text data (which I already did) and try to replicate the layout in javascript code? – Adri2210 Commented Feb 7 at 13:24
  • How do I trim/redact the unwanted parts on a page? Is there a tool for this? ChatGPT told me there would be pdf.js or tesseract.js for this, but I also do not want to use too many different tools for this solution, as somebody without my knowledge should be able to do this later on by themself. And the solution needs to be done locally on the laptop without using any APIs or external components – Adri2210 Commented Feb 7 at 13:43
  • Thank you, for the detailed explanation! How do I close this question without a solution now? – Adri2210 Commented Feb 7 at 14:09
  • You could "delete" this question and once using a different methodology raise any fresh question when you get stuck with the coding. – K J Commented Feb 7 at 14:53
Add a comment  | 

1 Answer 1

Reset to default 0

pdf-lib from javascript:

const { PDFDocument } = require('pdf-lib');

async function splitPDF(sourceBytes, pageRanges) {
  try {
    // Load source PDF
    const sourcePdf = await PDFDocument.load(sourceBytes);
    
    // Create new PDF
    const newPdf = await PDFDocument.create();
    
    // Copy pages with ranges
    for (const range of pageRanges) {
      const [start, end] = range;
      const pages = await newPdf.copyPages(sourcePdf, 
        Array.from({ length: end - start + 1 }, (_, i) => start + i)
      );
      pages.forEach(page => newPdf.addPage(page));
    }
    
    // Save new PDF
    return await newPdf.save();
  } catch (error) {
    console.error('Error:', error);
    throw error;
  }
}

// Usage Example:
async function main() {
  // Read source PDF
  const sourceBytes = await fetch('input.pdf').then(res => res.arrayBuffer());
  
  // Split pages 0-2 and 4-6
  const newPdfBytes = await splitPDF(sourceBytes, [[0, 2], [4, 6]]);
  
  // Download or display new PDF
  const blob = new Blob([newPdfBytes], { type: 'application/pdf' });
  const url = URL.createObjectURL(blob);
  
  // Create download link
  const link = document.createElement('a');
  link.href = url;
  link.download = 'split.pdf';
  link.click();
}

main();
发布评论

评论列表(0)

  1. 暂无评论