how do I convert a UTF-8 string to Latin1 encoded string using javascript?
Here is what I am trying to do:
- I get a file, split that in chunks by reading as arraybuffer
- then, I parse the arraybuffer as string
and passing it to cryptoJS for hash computation using following code:
cryptosha256 = CryptoJS.algo.SHA256.create(); cryptosha256.update(text); hash = cryptosha256.finalize();
It all works well for a text file. I get problems when using the code for hashing a non-text files (image/.wmv files). I saw in another blog and there the CryptoJS author requires the bytes to be sent using Latin1 format instead of UTF-8 and that's where I am stuck.
Not sure, how can I generate the bytes (or strings) using Latin1 format from arraybuffer in javascript?
$('#btnHash').click(function () {
var fr = new FileReader(),
file = document.getElementById("fileName").files[0];
fr.onload = function (e) {
calcHash(e.target.result, file);
};
fr.readAsArrayBuffer(file);
});
function calcHash(dataArray, file) {
cryptosha256 = CryptoJS.algo.SHA256.create();
text = CryptoJS.enc.Latin1.parse(dataArray);
cryptosha256.update(text);
hash = cryptosha256.finalize();
}
how do I convert a UTF-8 string to Latin1 encoded string using javascript?
Here is what I am trying to do:
- I get a file, split that in chunks by reading as arraybuffer
- then, I parse the arraybuffer as string
and passing it to cryptoJS for hash computation using following code:
cryptosha256 = CryptoJS.algo.SHA256.create(); cryptosha256.update(text); hash = cryptosha256.finalize();
It all works well for a text file. I get problems when using the code for hashing a non-text files (image/.wmv files). I saw in another blog and there the CryptoJS author requires the bytes to be sent using Latin1 format instead of UTF-8 and that's where I am stuck.
Not sure, how can I generate the bytes (or strings) using Latin1 format from arraybuffer in javascript?
$('#btnHash').click(function () {
var fr = new FileReader(),
file = document.getElementById("fileName").files[0];
fr.onload = function (e) {
calcHash(e.target.result, file);
};
fr.readAsArrayBuffer(file);
});
function calcHash(dataArray, file) {
cryptosha256 = CryptoJS.algo.SHA256.create();
text = CryptoJS.enc.Latin1.parse(dataArray);
cryptosha256.update(text);
hash = cryptosha256.finalize();
}
Share
Improve this question
edited Nov 25, 2015 at 14:51
Artjom B.
61.9k25 gold badges134 silver badges229 bronze badges
asked Nov 25, 2015 at 11:00
learnedOnelearnedOne
1531 gold badge2 silver badges11 bronze badges
12
- 2 'bytes' are not in Latin1 or any other format. And for binary files like (most) images and sounds, character encoding doesn't really apply. If you convert text from one encoding to another, you just have text in another encoding (with possibly the loss of some characters). If you convert a binary file to another text encoding, you will most likely have a corrupt file. – GolezTrol Commented Nov 25, 2015 at 11:01
- I'm pretty sure that CryptoJS does directly take an arraybuffer. No need to care about text encodings. – Bergi Commented Nov 25, 2015 at 11:02
- thanks GolezTrol... here is what crypto author writes: "When you pass a string to a hasher, it's converted to bytes using UTF-8. That's to ensure foreign characters are not clipped. Since you're working with binary data, you'll want to convert the string to bytes using Latin1." sha256.update(CryptoJS.enc.Latin1.parse(evt.target.result)); – learnedOne Commented Nov 25, 2015 at 11:03
- the link for above statement: code.google.com/p/crypto-js/issues/… – learnedOne Commented Nov 25, 2015 at 11:04
- when I tried using the crypto method sha256.update(CryptoJS.enc.Latin1.parse(evt.target.result)); It returned 'undefined' as hash value :( – learnedOne Commented Nov 25, 2015 at 11:05
1 Answer
Reset to default 19CryptoJS doesn't understand what an ArrayBuffer is and if you use some text encoding like Latin1 or UTF-8, you will inevitably lose some bytes. Not every possible byte value has a valid encoding in one of those text encodings.
You will have to convert the ArrayBuffer to CryptoJS' internal WordArray which holds the bytes as an array of words (32 bit integers). We can view the ArrayBuffer as an array of unsigned 8 bit integers and put them together to build the WordArray (see arrayBufferToWordArray
).
The following code shows a full example:
function arrayBufferToWordArray(ab) {
var i8a = new Uint8Array(ab);
var a = [];
for (var i = 0; i < i8a.length; i += 4) {
a.push(i8a[i] << 24 | i8a[i + 1] << 16 | i8a[i + 2] << 8 | i8a[i + 3]);
}
return CryptoJS.lib.WordArray.create(a, i8a.length);
}
function handleFileSelect(evt) {
var files = evt.target.files; // FileList object
// Loop through the FileList and render image files as thumbnails.
for (var i = 0, f; f = files[i]; i++) {
var reader = new FileReader();
// Closure to capture the file information.
reader.onloadend = (function(theFile) {
return function(e) {
var arrayBuffer = e.target.result;
var hash = CryptoJS.SHA256(arrayBufferToWordArray(arrayBuffer));
var elem = document.getElementById("hashValue");
elem.value = hash;
};
})(f);
reader.onerror = function(e) {
console.error(e);
};
// Read in the image file as a data URL.
reader.readAsArrayBuffer(f);
}
}
document.getElementById('upload').addEventListener('change', handleFileSelect, false);
<script src="https://cdn.rawgit.com/CryptoStore/crypto-js/3.1.2/build/rollups/sha256.js"></script>
<form method="post" enctype="multipart/form-data">
Select image to upload:
<input type="file" name="upload" id="upload">
<input type="text" name="hashValue" id="hashValue">
</form>
You can extend this code with the techniques in my other answer in order to hash files of arbitrary size without freezing the browser.