javascript - Split and Copy specific Pages from a PDF file

I am currently working on a program that can split a PDF-file with numerous sections that have the same layout, but of course different content.

I need all the different sections in a single pdf-file. So I have specific words or structures in the sections that repeat themselves before the start of a new one and that is where I want to split them.

I am using pdf-lib but it seems the library is not able to modify the original text and cannot copy pdf-data until specific data is read. I managed to get the basic text out with pdf-parse together but I actually want the original layout of the PDF to persist.

Is there any way to copy specific parts of pages in a pdf in their original layout to a new pdf-file? The sections can also vary in length from only half a page to multiple pages.

And if that is not possible, then is it possible to extract all the information about the layout, design and formats of the original pdf to design the copied ones the same as the original one?

For example the same texts need to be bold, underlined, cursive and I also need the bullet points. So all the information should be identical.

I am currently trying to do this in Javascript in Visual Studio Code.

Thanks in advance!

I am currently working on a program that can split a PDF-file with numerous sections that have the same layout, but of course different content.

Is there any way to copy specific parts of pages in a pdf in their original layout to a new pdf-file? The sections can also vary in length from only half a page to multiple pages.

And if that is not possible, then is it possible to extract all the information about the layout, design and formats of the original pdf to design the copied ones the same as the original one?

For example the same texts need to be bold, underlined, cursive and I also need the bullet points. So all the information should be identical.

I am currently trying to do this in Javascript in Visual Studio Code.

Thanks in advance!

Share Improve this question edited Feb 7 at 13:20 K J 11.8k4 gold badges23 silver badges62 bronze badges asked Feb 7 at 8:49 Adri2210 13 bronze badges

Is this achievable by someone with not a lot of cmd knowledge on a company laptop without special privileges? Or would it be faster to extract the whole text as basic text data (which I already did) and try to replicate the layout in javascript code? – Adri2210 Commented Feb 7 at 13:24
How do I trim/redact the unwanted parts on a page? Is there a tool for this? ChatGPT told me there would be pdf.js or tesseract.js for this, but I also do not want to use too many different tools for this solution, as somebody without my knowledge should be able to do this later on by themself. And the solution needs to be done locally on the laptop without using any APIs or external components – Adri2210 Commented Feb 7 at 13:43
Thank you, for the detailed explanation! How do I close this question without a solution now? – Adri2210 Commented Feb 7 at 14:09
You could "delete" this question and once using a different methodology raise any fresh question when you get stuck with the coding. – K J Commented Feb 7 at 14:53

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

pdf-lib from javascript:

const { PDFDocument } = require('pdf-lib');

async function splitPDF(sourceBytes, pageRanges) {
  try {
    // Load source PDF
    const sourcePdf = await PDFDocument.load(sourceBytes);
    
    // Create new PDF
    const newPdf = await PDFDocument.create();
    
    // Copy pages with ranges
    for (const range of pageRanges) {
      const [start, end] = range;
      const pages = await newPdf.copyPages(sourcePdf, 
        Array.from({ length: end - start + 1 }, (_, i) => start + i)
      );
      pages.forEach(page => newPdf.addPage(page));
    }
    
    // Save new PDF
    return await newPdf.save();
  } catch (error) {
    console.error('Error:', error);
    throw error;
  }
}

// Usage Example:
async function main() {
  // Read source PDF
  const sourceBytes = await fetch('input.pdf').then(res => res.arrayBuffer());
  
  // Split pages 0-2 and 4-6
  const newPdfBytes = await splitPDF(sourceBytes, [[0, 2], [4, 6]]);
  
  // Download or display new PDF
  const blob = new Blob([newPdfBytes], { type: 'application/pdf' });
  const url = URL.createObjectURL(blob);
  
  // Create download link
  const link = document.createElement('a');
  link.href = url;
  link.download = 'split.pdf';
  link.click();
}

main();

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Split and Copy specific Pages from a PDF file - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)