最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Why won't window.btoa work on – ” characters in Javascript? - Stack Overflow

programmeradmin0浏览0评论

So I'm converting a string to BASE64 as shown in the code below...

var str = "Hello World";
var enc = window.btoa(str);

This yields SGVsbG8gV29ybGQ=. However if I add these characters – ” such as the code shown below, the conversion doesn't happen. What is the reason behind this? Thank you so much.

var str = "Hello – World”";
var enc = window.btoa(str);

So I'm converting a string to BASE64 as shown in the code below...

var str = "Hello World";
var enc = window.btoa(str);

This yields SGVsbG8gV29ybGQ=. However if I add these characters – ” such as the code shown below, the conversion doesn't happen. What is the reason behind this? Thank you so much.

var str = "Hello – World”";
var enc = window.btoa(str);
Share Improve this question edited Aug 13, 2020 at 3:47 chris_techno25 asked Aug 13, 2020 at 3:40 chris_techno25chris_techno25 2,4775 gold badges22 silver badges35 bronze badges 2
  • Hi! I edited the post. Thank you – chris_techno25 Commented Aug 13, 2020 at 3:47
  • developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary – Mike 'Pomax' Kamermans Commented Aug 13, 2020 at 3:49
Add a comment  | 

6 Answers 6

Reset to default 13

btoa is an exotic function in that it requires a "Binary String", i.e. it's a String datatype but every "letter" doesn't represent a letter but a byte. As such, you can't have any "letters" with Unicode codepoints above 0xFF (charcode 255), such as used by your em dash and "fancy" quote symbol.

You'll either have to uri encode the data first, making it safe:

> var str = `Hello – World`;
> window.btoa(encodeURIComponent(str));
"SGVsbG8lMjAlRTIlODAlOTMlMjBXb3JsZA=="

And then remember to decode it again when unpacking yourself:

> var base64= "SGVsbG8lMjAlRTIlODAlOTMlMjBXb3JsZA==";
> decodeURIComponent(window.atob(base64));
"Hello – World"

Or rely on targets that automatically apply URI decoding like href attributes (a, link, etc).

However, if your target doesn't (your own code, or src attributes on img, script, etc.) then you'll need to turn your string into a new string that conforms to single byte packing. This is explicitly called out over on the MDN for base64, with their solution being:

function base64(data) {
  const bytes = new TextEncoder().encode(data);
  const binString = String.fromCodePoint(...bytes);
  return btoa(binString);
}

with the equivalent decoder:

function decode64(base64) {
  const binString = atob(base64);
  const bytes = Uint8Array.from(binString, (m) => m.codePointAt(0));
  return new TextDecoder().decode(bytes);
}

You'll need the decode64 if you want to unpack things in your own code, but the base64 function will yield a converted string that will work when put into a data-url (e.g. data:text/javascript;base64,${base64text});

The Problem is the character lies outside of Latin1 range.

For this you can use unescape (now deprecated)

var str = "Hello – World”";
var enc = btoa(unescape(encodeURIComponent(str)));

alert(enc);

And to decode:

var encStr = "SGVsbG8g4oCTIFdvcmxk4oCd";
var dec = decodeURIComponent(escape(window.atob(encStr)))

alert(dec);

The most bullet proof way is to work on binary data directly.

For this, you can encode your string to an ArrayBuffer object representing the UTF-8 version of your string.

Then a FileReader instance will be able to give you the base64 quite easily.

var str = "Hello – World”";
var buf = new TextEncoder().encode( str );
var reader = new FileReader();
reader.onload = evt => { console.log( reader.result.split(',')[1] ); };
reader.readAsDataURL( new Blob([buf]) );

And since the Blob() constructor does automagically encode DOMString instances to UTF-8, we could even get rid of the TextEncoder object:

var str = "Hello – World”";
var reader = new FileReader();
reader.onload = evt => { console.log( reader.result.split(',')[1] ); };
reader.readAsDataURL( new Blob([str]) );

i was struggled with this one too. so i made up two functions.

function TtB64(txt){
return btoa(new TextEncoder().encode(txt).join(' '))

};

function TfB64(txt){
return new TextDecoder().decode(new Uint8Array(atob(txt).split(' ').map(x => x=parseInt(x))))

};

the first one any text to base 64, and the second one from base 64 to text.

This ultimately owes to a deficiency in the JavaScript type system.

JavaScript strings are strings of 16-bit code units, which are customarily interpreted as UTF-16. The Base64 encoding is a method of transforming an 8-bit byte stream into a string of digits, by taking each three bytes and mapping them into four digits, each covering 6 bits: 3 × 8 = 4 × 6. As we see, this is crucially dependent on the bit width of each symbol.

At the time the btoa function was defined, JavaScript had no type for 8-bit byte streams, so the API was defined to take the ordinary 16-bit string type as input, with the restriction that each code unit was supposed to be confined to the range [U+0000, U+00FF]; when encoded into ISO-8859-1, such a string would reproduce the intended byte stream exactly.

(Newer code should probably use Uint8Array.fromBase64 and Uint8Array.prototype.toBase64 instead, when those become available.)

The character is U+2013, while is U+201D; neither of those characters fits into the above-mentioned range, so the function rejects it.

If you want to convert Unicode text into Base64, you need to pick a character encoding and convert it into a byte string first, and encode that. Asking for a Base64 encoding of a Unicode string itself is meaningless.

why so complicated?

btoa(new TextEncoder().encode("Hello – World”").join(''))

will do..

发布评论

评论列表(0)

  1. 暂无评论