最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - FileReader - which encodings are supported? - Stack Overflow

programmeradmin5浏览0评论

Want to simlpy read user-input files as text.

Can rely on modern browser usage, so I use FileReader for that (which works like a charm).

reader.readAsText(myfile, encoding);

I know that encoding defaults to UTF-8.

But as my users will upload files from various sources (Windows, Mac, Linux) and various browsers I ask the user to provide the encoding via a select box.

So e.g. for a western european windows text file I expect the user to choose e.g. windows-1252.

I was not able to find a list of supported encodings for FileReader (assuming this is at least depending on the browser).

I am not asking to auto-determine the encoding, I just want to fill my select box in a way like:

<select id="encoding">
   <option value="windows-1252">Windows (Western Latin)</option>
   <option value="utf-8">UTF-8</option>
   <option value="...">...</option>
</select>

So my questions are:

  1. Where do I get a list of supported encodings to fill the option values?
  2. How to determine the exact writing of those values (is it 'utf8' or 'UTF-8' or...) and are those depending on the OS / browser?
  3. Does readAsText(myfile, unsupportedEncoding) throw any error which I can catch if encoding is not supported?

I'd prefer not to use any major 3rd party libraries for that.

Bonus Question:

Is there a simple way to get meaningful translations of the values, e.g. cp10029 means Mac (Central European)?

Want to simlpy read user-input files as text.

Can rely on modern browser usage, so I use FileReader for that (which works like a charm).

reader.readAsText(myfile, encoding);

I know that encoding defaults to UTF-8.

But as my users will upload files from various sources (Windows, Mac, Linux) and various browsers I ask the user to provide the encoding via a select box.

So e.g. for a western european windows text file I expect the user to choose e.g. windows-1252.

I was not able to find a list of supported encodings for FileReader (assuming this is at least depending on the browser).

I am not asking to auto-determine the encoding, I just want to fill my select box in a way like:

<select id="encoding">
   <option value="windows-1252">Windows (Western Latin)</option>
   <option value="utf-8">UTF-8</option>
   <option value="...">...</option>
</select>

So my questions are:

  1. Where do I get a list of supported encodings to fill the option values?
  2. How to determine the exact writing of those values (is it 'utf8' or 'UTF-8' or...) and are those depending on the OS / browser?
  3. Does readAsText(myfile, unsupportedEncoding) throw any error which I can catch if encoding is not supported?

I'd prefer not to use any major 3rd party libraries for that.

Bonus Question:

Is there a simple way to get meaningful translations of the values, e.g. cp10029 means Mac (Central European)?

Share Improve this question asked Nov 24, 2016 at 15:29 LBALBA 4,0893 gold badges23 silver badges62 bronze badges 3
  • 2 A cursory search of the googles didn't reveal much. Maybe this will help? stackoverflow.com/questions/37884928/… – Dan Wilson Commented Nov 24, 2016 at 15:59
  • thanks, I googled a lot, that's why I am asking here :-( I checked your recommendation but this refers to a no-real-text-input IMHO but in my case all files are "real text" input only in different encodings. – LBA Commented Nov 24, 2016 at 16:16
  • 1 The supported code-pages can be found here. I would recommend taking a second look at the link provided by Dan as this is a good way to go about it. This approach also let you detect BOM and features to allow guessing the encoding in advance. – user1693593 Commented Nov 26, 2016 at 5:43
Add a comment  | 

2 Answers 2

Reset to default 11
  1. Encoding standarts - https://github.com/whatwg/encoding/ (in JSON format - https://github.com/whatwg/encoding/blob/master/encodings.json. Use values from fields "labels")

  1. Encoding parameter is not case sensitive.

  2. NO, readAsText(myfile, unsupportedEncoding) not throw any error. The function simply uses the default encoding("utf8").

    window.onload = function() {
    
        //Check File API support
        if (window.File && window.FileList && window.FileReader) {
            var filesInput = document.getElementById("files");
    
            filesInput.addEventListener("change", function(event) {
    
                var files = event.target.files; //FileList object
                var output = document.getElementById("result");
    
                for (var i = 0; i < files.length; i++) {
                    var file = files[i];
    
                    //Only plain text
                    if (!file.type.match('plain')) continue;
    
                    var picReader = new FileReader();
    
                    picReader.addEventListener("load", function(event) {
    
                        var textFile = event.target;
    
                        var div = document.createElement("div");
    
                        div.innerText = textFile.result;
    
                        output.insertBefore(div, null);
    
                    });
                    //Read the text file
                    picReader.readAsText(file, "cP1251");
                }
    
            });
        }
        else {
            console.log("Your browser does not support File API");
        }
    }
    

Demo

To get translations of the values you can use JSON file (https://github.com/whatwg/encoding/blob/master/encodings.json), parameter "heading" and "name".

Names and labels
The table below lists all encodings and their labels user agents must support. User agents must not support any other encodings or labels.

# UTF-8
"unicode-1-1-utf-8"
"unicode11utf8"
"unicode20utf8"
"utf-8"
"utf8"
"x-unicode20utf8"
# IBM866
"866"
"cp866"
"csibm866"
"ibm866"

# ISO-8859-2
"csisolatin2"
"iso-8859-2"
"iso-ir-101"
"iso8859-2"
"iso88592"
"iso_8859-2"
"iso_8859-2:1987"
"l2"
"latin2"

# ISO-8859-3
"csisolatin3"
"iso-8859-3"
"iso-ir-109"
"iso8859-3"
"iso88593"
"iso_8859-3"
"iso_8859-3:1988"
"l3"
"latin3"

# ISO-8859-4
"csisolatin4"
"iso-8859-4"
"iso-ir-110"
"iso8859-4"
"iso88594"
"iso_8859-4"
"iso_8859-4:1988"
"l4"
"latin4"

# ISO-8859-5
"csisolatincyrillic"
"cyrillic"
"iso-8859-5"
"iso-ir-144"
"iso8859-5"
"iso88595"
"iso_8859-5"
"iso_8859-5:1988"

# ISO-8859-6
"arabic"
"asmo-708"
"csiso88596e"
"csiso88596i"
"csisolatinarabic"
"ecma-114"
"iso-8859-6"
"iso-8859-6-e"
"iso-8859-6-i"
"iso-ir-127"
"iso8859-6"
"iso88596"
"iso_8859-6"
"iso_8859-6:1987"

# ISO-8859-7
"csisolatingreek"
"ecma-118"
"elot_928"
"greek"
"greek8"
"iso-8859-7"
"iso-ir-126"
"iso8859-7"
"iso88597"
"iso_8859-7"
"iso_8859-7:1987"
"sun_eu_greek"

# ISO-8859-8
"csiso88598e"
"csisolatinhebrew"
"hebrew"
"iso-8859-8"
"iso-8859-8-e"
"iso-ir-138"
"iso8859-8"
"iso88598"
"iso_8859-8"
"iso_8859-8:1988"
"visual"

# ISO-8859-8-I
"csiso88598i"
"iso-8859-8-i"
"logical"

# ISO-8859-10
"csisolatin6"
"iso-8859-10"
"iso-ir-157"
"iso8859-10"
"iso885910"
"l6"
"latin6"

# ISO-8859-13
"iso-8859-13"
"iso8859-13"
"iso885913"

# ISO-8859-14
"iso-8859-14"
"iso8859-14"
"iso885914"

# ISO-8859-15
"csisolatin9"
"iso-8859-15"
"iso8859-15"
"iso885915"
"iso_8859-15"
"l9"

# ISO-8859-16
"iso-8859-16"

# KOI8-R
"cskoi8r"
"koi"
"koi8"
"koi8-r"
"koi8_r"

# KOI8-U
"koi8-ru"
"koi8-u"

# macintosh
"csmacintosh"
"mac"
"macintosh"
"x-mac-roman"

# windows-874
"dos-874"
"iso-8859-11"
"iso8859-11"
"iso885911"
"tis-620"
"windows-874"

# windows-1250
"cp1250"
"windows-1250"
"x-cp1250"

# windows-1251
"cp1251"
"windows-1251"
"x-cp1251"

# windows-1252
"ansi_x3.4-1968"
"ascii"
"cp1252"
"cp819"
"csisolatin1"
"ibm819"
"iso-8859-1"
"iso-ir-100"
"iso8859-1"
"iso88591"
"iso_8859-1"
"iso_8859-1:1987"
"l1"
"latin1"
"us-ascii"
"windows-1252"
"x-cp1252"

# windows-1253
"cp1253"
"windows-1253"
"x-cp1253"

# windows-1254
"cp1254"
"csisolatin5"
"iso-8859-9"
"iso-ir-148"
"iso8859-9"
"iso88599"
"iso_8859-9"
"iso_8859-9:1989"
"l5"
"latin5"
"windows-1254"
"x-cp1254"

# windows-1255
"cp1255"
"windows-1255"
"x-cp1255"

# windows-1256
"cp1256"
"windows-1256"
"x-cp1256"

# windows-1257
"cp1257"
"windows-1257"
"x-cp1257"

# windows-1258
"cp1258"
"windows-1258"
"x-cp1258"

# x-mac-cyrillic
"x-mac-cyrillic"
"x-mac-ukrainian"

More encoding see here: https://encoding.spec.whatwg.org/#names-and-labels

发布评论

评论列表(0)

  1. 暂无评论