javascript - How to determine natural page size of a PDF using PDF.js

I am using pdf.js in a discovery setting to determine the height and width in pixels of a number of PDF documents.

In the following code snippet, I am pulling a buffer of an 8.5 x 11 Word document printed to PDF. The return I am receiving is the size divided by 4.16666... .

I found that if I pass a scale of 4.166666666666667 I get very close to the actual size of the document, usually within a few millionths of a pixel.


function process(images) {
    //All Images in the array have the same path
    let pdfdoc = images[0].ImageFilePath

    fs.readFile(pdfdoc, (err, imageBuffer) => {
        let u = PDFJSLib.getDocument(imageBuffer)
        images.forEach(img => {
            //if we failed to read the pdf, we need to mark each page for manual review.
            if(err) {
                console.error(err)
                postMessage({height:-1, width:-1, ImageFilePath:img.ImageFilePath, DocId:img.DocId, PageId:img.PageId})
            }
            else {
                u.promise.then(pdf => {
                    pdf.getPage(img.PageNumber).then(data => {
                        console.log(data.getViewport(1).width)
                        console.log(data.getViewport(1).height)
                    })
                });    
            }
        })

    })
}

The output I am expecting is the natural width and height to be logged to the console. I need to understand what scale I should be passing in, and what factors determine that scale value. Can I safely pass in 4.166666666666667 and know I'm getting the natural height and width of the page each time?

Other questions I've found relating to this usually have to do with passing the PDF to a viewer -- which I am not doing. Again, my goal is to simply discover the natural height and width of a given PDF page.

Thanks!

I am using pdf.js in a discovery setting to determine the height and width in pixels of a number of PDF documents.

In the following code snippet, I am pulling a buffer of an 8.5 x 11 Word document printed to PDF. The return I am receiving is the size divided by 4.16666... .

I found that if I pass a scale of 4.166666666666667 I get very close to the actual size of the document, usually within a few millionths of a pixel.


function process(images) {
    //All Images in the array have the same path
    let pdfdoc = images[0].ImageFilePath

    fs.readFile(pdfdoc, (err, imageBuffer) => {
        let u = PDFJSLib.getDocument(imageBuffer)
        images.forEach(img => {
            //if we failed to read the pdf, we need to mark each page for manual review.
            if(err) {
                console.error(err)
                postMessage({height:-1, width:-1, ImageFilePath:img.ImageFilePath, DocId:img.DocId, PageId:img.PageId})
            }
            else {
                u.promise.then(pdf => {
                    pdf.getPage(img.PageNumber).then(data => {
                        console.log(data.getViewport(1).width)
                        console.log(data.getViewport(1).height)
                    })
                });    
            }
        })

    })
}

Thanks!

Share Improve this question asked Jun 3, 2019 at 14:56 Tanner 2,4211 gold badge15 silver badges23 bronze badges

Further information, the viewbox returned by data.getViewport() is [0, 0, 612, 792], no matter the scale passed – Tanner Commented Jun 3, 2019 at 14:59
1 hope this thread helps you github./mozilla/pdf.js/issues/9408 – weegee Commented Jun 3, 2019 at 15:00
1 @window.document - this is one of the links I saw. But rereading it just now helped me figure out the answer. The reason I'm getting 612/792 is because it's setting a 72 dpi scale. 300 divided by 72 = 4.166666666666667 - so I think my course of action will be to multiply the height/width returned by 300 then divide by 72. – Tanner Commented Jun 3, 2019 at 15:04
What is 300 here? Can you please explain? – Rajan Rawal Commented Jan 28, 2020 at 12:06

Add a ment |

2 Answers 2

Sorted by: Reset to default 2

On further review of this issue, I determined that the output page sizes in pixels are assuming a DPI of 72. I can divide the values (612, 792) by 72 then multiply them by 300 to get my expected numbers: 2550 and 3300.

let dimensions = data.getViewport(1).viewBox.map(n => n / 72 * 300)
 //[ 0, 0, 2550, 3300 ]

Get pdf width and height as an inch from getViewport

let viewport= data.getViewport({ scale: 1 });
let inchHeight=viewport.height/72;
let inchWidth=viewport.width/72;

Convert to pixel

 let pixleHeight=inchHeight*96;
 let pixleWidth=inchWidth*96;

Find document page size (inch) from Adobe -> Open document in Adobe, go to file->properties OR ctrl+d

The variable "data" from

let pageIndex=1
 blobToBase64(PDFStream).then((val) => {  --Convert PDF stream to blob
              pdfjsLib.getDocument(val as string).promise.then((document) => {
document.getPage(pageIndex).then(function (page: any) {
let data= page

 });
 });    
});

  blobToBase64(blob: any): Promise<string | ArrayBuffer | null> {
    return new Promise((resolve) => {
      const reader = new FileReader();
      reader.onloadend = () => resolve(reader.result);
      reader.readAsDataURL(blob);
    });
  }

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - How to determine natural page size of a PDF using PDF.js - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)