最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Node.js buffer encoding issue - Stack Overflow

programmeradmin2浏览0评论

I'm having trouble understanding character encoding in node.js. I'm transmitting data and for some reason the encoding causes certain characters to be replaced with other ones. What I'm doing is base 64 encoding at the client side and decoding it in node.js.

To simplify, I narrowed it down to this piece of code which fails:

new Buffer("1w==", 'base64').toString('utf8');

The 1w== is the base 64 encoding of the × character. Now, when passing this string with the 'base64' argument to a buffer and then doing .toString('utf8') I expected to get the same character back, but I didn't. Instead I got (character code 65533).

Is the encoding utf8 wrong? If so, what should I use instead? If not, how can I decode a base 64 string in node.js?

I'm having trouble understanding character encoding in node.js. I'm transmitting data and for some reason the encoding causes certain characters to be replaced with other ones. What I'm doing is base 64 encoding at the client side and decoding it in node.js.

To simplify, I narrowed it down to this piece of code which fails:

new Buffer("1w==", 'base64').toString('utf8');

The 1w== is the base 64 encoding of the × character. Now, when passing this string with the 'base64' argument to a buffer and then doing .toString('utf8') I expected to get the same character back, but I didn't. Instead I got (character code 65533).

Is the encoding utf8 wrong? If so, what should I use instead? If not, how can I decode a base 64 string in node.js?

Share Improve this question asked Aug 2, 2011 at 19:02 pimvdbpimvdb 155k80 gold badges311 silver badges356 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 4

No, your assumption is wrong. The base64-encoded string obviously has only one byte encoded. And all Unicode code points above U+007F need at least two bytes for being encoded in UTF-8.

I'm still not good at decoding base64 in mind, but try ISO-8859-1 instead.

The point is, base64 decoding transforms a character string to a byte string. You assumed that it decodes to a character string, but this is wrong. You still need to encode the byte string to a character string, and in your case the correct encoding is ISO-8859-1.

echo -n x | base64

gives

eA==

The given code would give the expected answer if the encoding were correct. The problem is likely on the encoding side. (1w== translates to the byte 0xD7 which would be the start of a multi-byte UTF-8 character)

发布评论

评论列表(0)

  1. 暂无评论