最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Double-Escaped Unicode Javascript Issue - Stack Overflow

programmeradmin1浏览0评论

I am having a problem displaying a Javascript string with embedded Unicode character escape sequences (\uXXXX) where the initial "\" character is itself escaped as "\" What do I need to do to transform the string so that it properly evaluates the escape sequences and produces output with the correct Unicode character?

For example, I am dealing with input such as:

"this is a \u201ctest\u201d";

attempting to decode the "\" using a regex expression, e.g.:

var out  = text.replace('/\/g','\');

results in the output text:

"this is a \u201ctest\u201d";

that is, the Unicode escape sequences are displayed as actual escape sequences, not the double quote characters I would like.

I am having a problem displaying a Javascript string with embedded Unicode character escape sequences (\uXXXX) where the initial "\" character is itself escaped as "\" What do I need to do to transform the string so that it properly evaluates the escape sequences and produces output with the correct Unicode character?

For example, I am dealing with input such as:

"this is a \u201ctest\u201d";

attempting to decode the "\" using a regex expression, e.g.:

var out  = text.replace('/\/g','\');

results in the output text:

"this is a \u201ctest\u201d";

that is, the Unicode escape sequences are displayed as actual escape sequences, not the double quote characters I would like.

Share Improve this question edited May 9, 2010 at 4:31 Jon Seigel 12.4k8 gold badges60 silver badges93 bronze badges asked Nov 8, 2008 at 18:17 Jeffrey WinterJeffrey Winter
Add a comment  | 

5 Answers 5

Reset to default 6

As it turns out, it's unescape() we want, but with '%uXXXX' rather than '\uXXXX':

unescape(yourteststringhere.replace(/\/g,'%'))

This is a terrible solution, but you can do this:

var x = "this is a \u201ctest\u201d".replace(/\/g,'\\')
// x is now "this is a \u201ctest\u201d"
eval('x = "' + x + '"')
// x is now "this is a “test”"

It's terrible because:

  • eval can be dangerous, if you don't know what's in the string

  • the string quoting in the eval statement will break if you have actual quotation marks in your string

Are you sure '\' is the only character that might get HTML-escaped? Are you sure '\uXXXX' is the only kind of string escape in use?

If not, you'll need a general-purpose HTML-character/entity-reference-decoder and JS-string-literal-decoder. Unfortunately JavaScript has no built-in methods for this and it's quite tedious to do manually with a load of regexps.

It is possible to take advantage of the browser's HTML-decoder by assigning the string to an element's innerHTML property, and then ask JavaScript to decode the string as above:

var el= document.createElement('div');
el.innerHTML= s;
return eval('"'+el.firstChild.data+'"');

However this is an incredibly ugly hack and a security hole if the string comes from a source that isn't 100% trusted.

Where are the strings coming from? It would be nicer if possible to deal with the problem at the server end where you may have more powerful text handling features available. And if you could fix whatever it is that is unnecessarily HTML-escaping your backslashes you could find the problem fixes itself.

I'm not sure if this is it, but the answer might have something to do with eval(), if you can trust your input.

I was thinking along the same lines, but using eval() in everyway I could imagine resulted in the same escaped output; e.g.,

eval(new String("this is a \u201ctest&#amp;92;u201d"));

or even

eval(new String("this is a \u201ctest&#amp;92;u201d".replace('/\/g','\')));

all results in the same thing:

"this is a \u201ctest\u201d";

It's as if I need to get the Javascript engine to somehow re-evaluate or re-parse the string, but I don't know what would do it. I thought perhaps eval() or just creating a new string from using the properly escaped input would do it, but now luck.

The fundamental question is - what do I have to do to turn the given string:

"this is a \u201ctest&#amp;92;u201d"

into a string that uses the proper Unicode characters?

发布评论

评论列表(0)

  1. 暂无评论