javascript - Replacing UTF-8 characters

I am working on an open jquery library jspdf.The above library does not support UTF-8 characters. Is there any way so that i can remove all the quotes UTF-8 character in my html string by using regex or any other method.

PSEDO CODE:

$(htmlstring).replace("utf-8 quotes character" , "")

PSEDO CODE:

$(htmlstring).replace("utf-8 quotes character" , "")

Share Improve this question asked Jul 30, 2014 at 17:16 user3861247

2 You seriously have a javascript library that does not support UTF-8 ? – adeneo Commented Jul 30, 2014 at 17:17
yes it's jspdf library you can search it HERE IT IS – user3861247 Commented Jul 30, 2014 at 17:20
would you please provide me some solution such that i can remove utf-8 characters from my html string without much effecting it – user3861247 Commented Jul 30, 2014 at 17:22
Are your trouble same as [this][1]? [1]: stackoverflow.com/questions/2145988/… – Miranda Commented Jul 30, 2014 at 17:22
Well that sucks, you would think UTF-8 support is a minimum requirement, good thing I never used jsPDF, that simply doesn't cut it for most websites. – adeneo Commented Jul 30, 2014 at 17:24

| Show 1 more comment

2 Answers 2

Sorted by: Reset to default 11

First off: I urge you to stop using jsPDF if it doesn't support Unicode. It's mid 2014, and the lack of support should have meant the death of the project two years ago. But that's just my personal conviction and not part of the answer you're looking for.

If jsPDF only supports ANSI (a 255 character block, rather than ASCII's 127 character block), then you can simply do a regex replace for everything above \xFF:

"lolテスト".replace(/[\u0100-\uFFFF]/g,'');
// gives us "lol"

If you only want to get rid of quotation marks (but leave in potentially jsPDF breaking unicode), you can use the pattern for "just quotation marks" based on where they live in the unicode map:

string.replace(/[\u2018-\u201F\u275B-\u275E]/g, '')

will catch ['‘','’','‚','‛','“','”','„','‟','❛','❜','❝','❞'], although of course what you probably want to do is replace them with the corresponding safe character instead. Good news: just make a replacement array for the list just presented, and work with that.

2017 edit:

ES6 introduced a new pattern for unicode strings in the form of the \u{...} pattern, which can do "any number of hexdigits" inside the curly braces, so a full Unicode 9 compatible regexp would now be:

// we can't use these in a regexp directly, unfortunately
start = `\u{100}`;
end = `\u{10FFF0}`;
searchPattern = new RegExp(`[${start}-${end}]`,`g`);
c = `lolテスト`.replace(searchPattern, ``);

use

$(htmlstring).replace(/[^\x00-\x7F]/g,'')

to remove all non-ascii charakter

(via regex-any-ascii-character)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Replacing UTF-8 characters - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)