Want to simlpy read user-input files as text.
Can rely on modern browser usage, so I use FileReader for that (which works like a charm).
reader.readAsText(myfile, encoding);
I know that encoding
defaults to UTF-8.
But as my users will upload files from various sources (Windows, Mac, Linux) and various browsers I ask the user to provide the encoding via a select box.
So e.g. for a western european windows text file I expect the user to choose e.g. windows-1252.
I was not able to find a list of supported encodings for FileReader (assuming this is at least depending on the browser).
I am not asking to auto-determine the encoding, I just want to fill my select box in a way like:
<select id="encoding">
<option value="windows-1252">Windows (Western Latin)</option>
<option value="utf-8">UTF-8</option>
<option value="...">...</option>
</select>
So my questions are:
- Where do I get a list of supported encodings to fill the option values?
- How to determine the exact writing of those values (is it 'utf8' or 'UTF-8' or...) and are those depending on the OS / browser?
- Does readAsText(myfile, unsupportedEncoding) throw any error which I can catch if encoding is not supported?
I'd prefer not to use any major 3rd party libraries for that.
Bonus Question:
Is there a simple way to get meaningful translations of the values, e.g. cp10029 means Mac (Central European)?
Want to simlpy read user-input files as text.
Can rely on modern browser usage, so I use FileReader for that (which works like a charm).
reader.readAsText(myfile, encoding);
I know that encoding
defaults to UTF-8.
But as my users will upload files from various sources (Windows, Mac, Linux) and various browsers I ask the user to provide the encoding via a select box.
So e.g. for a western european windows text file I expect the user to choose e.g. windows-1252.
I was not able to find a list of supported encodings for FileReader (assuming this is at least depending on the browser).
I am not asking to auto-determine the encoding, I just want to fill my select box in a way like:
<select id="encoding">
<option value="windows-1252">Windows (Western Latin)</option>
<option value="utf-8">UTF-8</option>
<option value="...">...</option>
</select>
So my questions are:
- Where do I get a list of supported encodings to fill the option values?
- How to determine the exact writing of those values (is it 'utf8' or 'UTF-8' or...) and are those depending on the OS / browser?
- Does readAsText(myfile, unsupportedEncoding) throw any error which I can catch if encoding is not supported?
I'd prefer not to use any major 3rd party libraries for that.
Bonus Question:
Is there a simple way to get meaningful translations of the values, e.g. cp10029 means Mac (Central European)?
Share Improve this question asked Nov 24, 2016 at 15:29 LBALBA 4,0893 gold badges23 silver badges62 bronze badges 3- 2 A cursory search of the googles didn't reveal much. Maybe this will help? stackoverflow.com/questions/37884928/… – Dan Wilson Commented Nov 24, 2016 at 15:59
- thanks, I googled a lot, that's why I am asking here :-( I checked your recommendation but this refers to a no-real-text-input IMHO but in my case all files are "real text" input only in different encodings. – LBA Commented Nov 24, 2016 at 16:16
- 1 The supported code-pages can be found here. I would recommend taking a second look at the link provided by Dan as this is a good way to go about it. This approach also let you detect BOM and features to allow guessing the encoding in advance. – user1693593 Commented Nov 26, 2016 at 5:43
2 Answers
Reset to default 11- Encoding standarts - https://github.com/whatwg/encoding/ (in JSON format - https://github.com/whatwg/encoding/blob/master/encodings.json. Use values from fields "labels")
Encoding parameter is not case sensitive.
NO, readAsText(myfile, unsupportedEncoding) not throw any error. The function simply uses the default encoding("utf8").
window.onload = function() { //Check File API support if (window.File && window.FileList && window.FileReader) { var filesInput = document.getElementById("files"); filesInput.addEventListener("change", function(event) { var files = event.target.files; //FileList object var output = document.getElementById("result"); for (var i = 0; i < files.length; i++) { var file = files[i]; //Only plain text if (!file.type.match('plain')) continue; var picReader = new FileReader(); picReader.addEventListener("load", function(event) { var textFile = event.target; var div = document.createElement("div"); div.innerText = textFile.result; output.insertBefore(div, null); }); //Read the text file picReader.readAsText(file, "cP1251"); } }); } else { console.log("Your browser does not support File API"); } }
Demo
To get translations of the values you can use JSON file (https://github.com/whatwg/encoding/blob/master/encodings.json), parameter "heading" and "name".
Names and labels
The table below lists all encodings and their labels user agents must support. User agents must not support any other encodings or labels.
# UTF-8
"unicode-1-1-utf-8"
"unicode11utf8"
"unicode20utf8"
"utf-8"
"utf8"
"x-unicode20utf8"
# IBM866
"866"
"cp866"
"csibm866"
"ibm866"
# ISO-8859-2
"csisolatin2"
"iso-8859-2"
"iso-ir-101"
"iso8859-2"
"iso88592"
"iso_8859-2"
"iso_8859-2:1987"
"l2"
"latin2"
# ISO-8859-3
"csisolatin3"
"iso-8859-3"
"iso-ir-109"
"iso8859-3"
"iso88593"
"iso_8859-3"
"iso_8859-3:1988"
"l3"
"latin3"
# ISO-8859-4
"csisolatin4"
"iso-8859-4"
"iso-ir-110"
"iso8859-4"
"iso88594"
"iso_8859-4"
"iso_8859-4:1988"
"l4"
"latin4"
# ISO-8859-5
"csisolatincyrillic"
"cyrillic"
"iso-8859-5"
"iso-ir-144"
"iso8859-5"
"iso88595"
"iso_8859-5"
"iso_8859-5:1988"
# ISO-8859-6
"arabic"
"asmo-708"
"csiso88596e"
"csiso88596i"
"csisolatinarabic"
"ecma-114"
"iso-8859-6"
"iso-8859-6-e"
"iso-8859-6-i"
"iso-ir-127"
"iso8859-6"
"iso88596"
"iso_8859-6"
"iso_8859-6:1987"
# ISO-8859-7
"csisolatingreek"
"ecma-118"
"elot_928"
"greek"
"greek8"
"iso-8859-7"
"iso-ir-126"
"iso8859-7"
"iso88597"
"iso_8859-7"
"iso_8859-7:1987"
"sun_eu_greek"
# ISO-8859-8
"csiso88598e"
"csisolatinhebrew"
"hebrew"
"iso-8859-8"
"iso-8859-8-e"
"iso-ir-138"
"iso8859-8"
"iso88598"
"iso_8859-8"
"iso_8859-8:1988"
"visual"
# ISO-8859-8-I
"csiso88598i"
"iso-8859-8-i"
"logical"
# ISO-8859-10
"csisolatin6"
"iso-8859-10"
"iso-ir-157"
"iso8859-10"
"iso885910"
"l6"
"latin6"
# ISO-8859-13
"iso-8859-13"
"iso8859-13"
"iso885913"
# ISO-8859-14
"iso-8859-14"
"iso8859-14"
"iso885914"
# ISO-8859-15
"csisolatin9"
"iso-8859-15"
"iso8859-15"
"iso885915"
"iso_8859-15"
"l9"
# ISO-8859-16
"iso-8859-16"
# KOI8-R
"cskoi8r"
"koi"
"koi8"
"koi8-r"
"koi8_r"
# KOI8-U
"koi8-ru"
"koi8-u"
# macintosh
"csmacintosh"
"mac"
"macintosh"
"x-mac-roman"
# windows-874
"dos-874"
"iso-8859-11"
"iso8859-11"
"iso885911"
"tis-620"
"windows-874"
# windows-1250
"cp1250"
"windows-1250"
"x-cp1250"
# windows-1251
"cp1251"
"windows-1251"
"x-cp1251"
# windows-1252
"ansi_x3.4-1968"
"ascii"
"cp1252"
"cp819"
"csisolatin1"
"ibm819"
"iso-8859-1"
"iso-ir-100"
"iso8859-1"
"iso88591"
"iso_8859-1"
"iso_8859-1:1987"
"l1"
"latin1"
"us-ascii"
"windows-1252"
"x-cp1252"
# windows-1253
"cp1253"
"windows-1253"
"x-cp1253"
# windows-1254
"cp1254"
"csisolatin5"
"iso-8859-9"
"iso-ir-148"
"iso8859-9"
"iso88599"
"iso_8859-9"
"iso_8859-9:1989"
"l5"
"latin5"
"windows-1254"
"x-cp1254"
# windows-1255
"cp1255"
"windows-1255"
"x-cp1255"
# windows-1256
"cp1256"
"windows-1256"
"x-cp1256"
# windows-1257
"cp1257"
"windows-1257"
"x-cp1257"
# windows-1258
"cp1258"
"windows-1258"
"x-cp1258"
# x-mac-cyrillic
"x-mac-cyrillic"
"x-mac-ukrainian"
More encoding see here: https://encoding.spec.whatwg.org/#names-and-labels