In my application, a user can upload a PDF which other users can later view. For my usecase, I need to ensure that the PDFs are not locked or encrypted and can be viewed by any other user.
To do this, I am asking users to upload unlocked PDFs and would like to throw an error if the PDF is locked, before I try to upload to S3.
I haven't found a consensus on what might be the best way to do this, in-browser? Do I try to read the buffer and throw an error if I am unable to? Or is there another performant and efficient way of detecting this?
In my application, a user can upload a PDF which other users can later view. For my usecase, I need to ensure that the PDFs are not locked or encrypted and can be viewed by any other user.
To do this, I am asking users to upload unlocked PDFs and would like to throw an error if the PDF is locked, before I try to upload to S3.
I haven't found a consensus on what might be the best way to do this, in-browser? Do I try to read the buffer and throw an error if I am unable to? Or is there another performant and efficient way of detecting this?
Share Improve this question asked Jun 23, 2017 at 23:48 geoboygeoboy 1,2622 gold badges13 silver badges27 bronze badges 3-
1
Detecting encryption should be as little as searching for the string
/Encrypt
in the file. – Thomas Commented Jun 24, 2017 at 0:47 - So, in the file buffer (or after reading file as text), just search for '/Encrypt'? Anything else to watch out for? – geoboy Commented Jun 24, 2017 at 4:09
- 1 I think that "Anything else to watch out for?" question is eventually going to require you to understand the format better--or at least the parts of it that can contain encryption/protection flags. Consider a PDF document that includes in its visible content the text "/Encrypt". You might get a false match on it unless you were parsing with the full context of the file format, e.g. "/Encrypt" must appear at beginning of file, inside of a certain section. en.wikipedia/wiki/Portable_Document_Format – Erik Hermansen Commented Jun 24, 2017 at 21:36
3 Answers
Reset to default 5You can try using the below solution:
const reader = new FileReader();
reader.readAsArrayBuffer(file);
reader.onload = function () {
var files = new Blob([reader.result], {type: 'application/pdf'});
files.text().then(x=> {
console.log("isEncrypted", x.includes("Encrypt")) // true, if Encrypted
console.log("isEncrypted", x.substring(x.lastIndexOf("<<"), x.lastIndexOf(">>")).includes("/Encrypt"));
console.log(file.name);
});
It's better for the user experience, bandwidth, and performance to detect the status on the client side. You can have a file input element on your page, and trap the onChange event.
<input type="file" id="pdfFile" size="50" onChange='processFile' />
Inside of the onChange-handling function, you can get at the file bytes and load into a buffer. For code and more details, see reading file contents on the client side in javascript in various browsers.
You'll need to do some PDF parsing to learn the locked/encrypted status, but I imagine there are JS libraries that do it. Even if you have very large PDFs to parse, it will always be faster than uploading the PDF to the server, since that upload time will be a function of file size.
Cases I could see for uploading the file instead of client-side parsing:
- you are targeting lower-end mobile devices and expect PDFs that are +100mb.
- you will be running on browsers with Javascript restrictions
- you always want to upload the file to your server even if the PDF is protected, and you've worked out that the user experience is better
What you can do is use pdfjs to open the pdf file and try to get the number of pages. When the file is password protected you get a PasswordException.
Have a look to this fiddle: https://jsfiddle/fe6jLgr5/15/
document.getElementById("pdfFile").addEventListener("change",
function(event) {
let file = event.target.files[0];
let reader = new FileReader();
reader.readAsArrayBuffer(file);
reader.onload = function(e) {
var docInitParams = {
data: e.target.result,
password: ''
};
pdfjsLib.getDocument(docInitParams).promise.then((pdfDocument) =>
{
// get all the pages from pdf, works if not password protected.
const numPages = pdfDocument.numPages;
console.log('Doc not password protected');
}).catch(err => console.log(err))
}
},false);