I am using FileReader.readAsArrayBuffer(file) and converting the result into a Uint8Array.
If the text file input contains a pound sterling sign (£), then this single character results in two byte codes, one for  and one for £. I understand that this is because £ is in the extended-ASCII set.
Is there a way to prevent this extra character? If not, will it always be an Â? If so, I can strip them out.
I am using FileReader.readAsArrayBuffer(file) and converting the result into a Uint8Array.
If the text file input contains a pound sterling sign (£), then this single character results in two byte codes, one for  and one for £. I understand that this is because £ is in the extended-ASCII set.
Is there a way to prevent this extra character? If not, will it always be an Â? If so, I can strip them out.
Share asked Mar 7 at 13:24 hobbes_childhobbes_child 1433 silver badges17 bronze badges 2 |1 Answer
Reset to default 1You didn't provide your js code, But it seems this happen due to a mismatch between the character encoding of the text file and how you're js interpreting it. If i assume you're reading the file as text, maybe i was right in my thinking. I will just drop this playground to give you a reference and hoping you will solve your problem.
function checkFile(file){
const fileReader = new FileReader();
fileReader.onload = function(event) {
const uint8Array = new Uint8Array(event.target.result);
// Use TextDecoder to convert Uint8Array into string
const textDecoder = new TextDecoder('utf-8', { fatal: true });
try{
const result = textDecoder.decode(uint8Array);
console.log(result); // This should correctly display the pound sign and show £ without Â.
}catch(error){
console.error('Decoding was Failed:', error);
}
};
fileReader.readAsArrayBuffer(file);
}
function uploadFile(){
const file = event.target.files[0];
if(file){
checkFile(file);
}
}
<input type="file" onchange="uploadFile()" />
£
is what you get, when a pound sign that was encoded in UTF-8, gets interpreted as ASCII/ISO-8859-x. Was the file you are reading saved with a BOM indicating its encoding, or without? – C3roe Commented Mar 7 at 13:31