I am using pdf.js in a discovery setting to determine the height and width in pixels of a number of PDF documents.
In the following code snippet, I am pulling a buffer of an 8.5 x 11 Word document printed to PDF. The return I am receiving is the size divided by 4.16666... .
I found that if I pass a scale of 4.166666666666667 I get very close to the actual size of the document, usually within a few millionths of a pixel.
function process(images) {
//All Images in the array have the same path
let pdfdoc = images[0].ImageFilePath
fs.readFile(pdfdoc, (err, imageBuffer) => {
let u = PDFJSLib.getDocument(imageBuffer)
images.forEach(img => {
//if we failed to read the pdf, we need to mark each page for manual review.
if(err) {
console.error(err)
postMessage({height:-1, width:-1, ImageFilePath:img.ImageFilePath, DocId:img.DocId, PageId:img.PageId})
}
else {
u.promise.then(pdf => {
pdf.getPage(img.PageNumber).then(data => {
console.log(data.getViewport(1).width)
console.log(data.getViewport(1).height)
})
});
}
})
})
}
The output I am expecting is the natural width and height to be logged to the console. I need to understand what scale I should be passing in, and what factors determine that scale value. Can I safely pass in 4.166666666666667 and know I'm getting the natural height and width of the page each time?
Other questions I've found relating to this usually have to do with passing the PDF to a viewer -- which I am not doing. Again, my goal is to simply discover the natural height and width of a given PDF page.
Thanks!
I am using pdf.js in a discovery setting to determine the height and width in pixels of a number of PDF documents.
In the following code snippet, I am pulling a buffer of an 8.5 x 11 Word document printed to PDF. The return I am receiving is the size divided by 4.16666... .
I found that if I pass a scale of 4.166666666666667 I get very close to the actual size of the document, usually within a few millionths of a pixel.
function process(images) {
//All Images in the array have the same path
let pdfdoc = images[0].ImageFilePath
fs.readFile(pdfdoc, (err, imageBuffer) => {
let u = PDFJSLib.getDocument(imageBuffer)
images.forEach(img => {
//if we failed to read the pdf, we need to mark each page for manual review.
if(err) {
console.error(err)
postMessage({height:-1, width:-1, ImageFilePath:img.ImageFilePath, DocId:img.DocId, PageId:img.PageId})
}
else {
u.promise.then(pdf => {
pdf.getPage(img.PageNumber).then(data => {
console.log(data.getViewport(1).width)
console.log(data.getViewport(1).height)
})
});
}
})
})
}
The output I am expecting is the natural width and height to be logged to the console. I need to understand what scale I should be passing in, and what factors determine that scale value. Can I safely pass in 4.166666666666667 and know I'm getting the natural height and width of the page each time?
Other questions I've found relating to this usually have to do with passing the PDF to a viewer -- which I am not doing. Again, my goal is to simply discover the natural height and width of a given PDF page.
Thanks!
Share Improve this question asked Jun 3, 2019 at 14:56 TannerTanner 2,4211 gold badge15 silver badges23 bronze badges 4-
Further information, the viewbox returned by
data.getViewport()
is [0, 0, 612, 792], no matter the scale passed – Tanner Commented Jun 3, 2019 at 14:59 - 1 hope this thread helps you github./mozilla/pdf.js/issues/9408 – weegee Commented Jun 3, 2019 at 15:00
- 1 @window.document - this is one of the links I saw. But rereading it just now helped me figure out the answer. The reason I'm getting 612/792 is because it's setting a 72 dpi scale. 300 divided by 72 = 4.166666666666667 - so I think my course of action will be to multiply the height/width returned by 300 then divide by 72. – Tanner Commented Jun 3, 2019 at 15:04
- What is 300 here? Can you please explain? – Rajan Rawal Commented Jan 28, 2020 at 12:06
2 Answers
Reset to default 2On further review of this issue, I determined that the output page sizes in pixels are assuming a DPI of 72. I can divide the values (612, 792) by 72 then multiply them by 300 to get my expected numbers: 2550 and 3300.
let dimensions = data.getViewport(1).viewBox.map(n => n / 72 * 300)
//[ 0, 0, 2550, 3300 ]
Get pdf width and height as an inch from getViewport
let viewport= data.getViewport({ scale: 1 });
let inchHeight=viewport.height/72;
let inchWidth=viewport.width/72;
Convert to pixel
let pixleHeight=inchHeight*96;
let pixleWidth=inchWidth*96;
Find document page size (inch) from Adobe -> Open document in Adobe, go to file->properties OR ctrl+d
The variable "data" from
let pageIndex=1
blobToBase64(PDFStream).then((val) => { --Convert PDF stream to blob
pdfjsLib.getDocument(val as string).promise.then((document) => {
document.getPage(pageIndex).then(function (page: any) {
let data= page
});
});
});
blobToBase64(blob: any): Promise<string | ArrayBuffer | null> {
return new Promise((resolve) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result);
reader.readAsDataURL(blob);
});
}