I spent some time looking best way to escape html string and found some discussions on that: discussion 1 discussion 2. It leads me to replaceAll function. Then I did performance tests and tried to find solution achieving similar speed with no success :(
Here is my final test case set. I found it on net and expand with my tries (4 cases at bottom) and still can not reach replaceAll()
performance.
What is secret witch makes replaceAll()
solution so speedy?
Greets!
Code snippets:
String.prototype.replaceAll = function(str1, str2, ignore)
{
return this.replace(new RegExp(str1.replace(/([\/\,\!\\\^\$\{\}\[\]\(\)\.\*\+\?\|\<\>\-\&])/g,"\\$&"),(ignore?"gi":"g")),(typeof(str2)=="string")?str2.replace(/\$/g,"$$$$"):str2);
};
credits for qwerty
Fastest case so far:
html.replaceAll('&', '&').replaceAll('"', '"').replaceAll("'", ''').replaceAll('<', '<').replaceAll('>', '>');
I spent some time looking best way to escape html string and found some discussions on that: discussion 1 discussion 2. It leads me to replaceAll function. Then I did performance tests and tried to find solution achieving similar speed with no success :(
Here is my final test case set. I found it on net and expand with my tries (4 cases at bottom) and still can not reach replaceAll()
performance.
What is secret witch makes replaceAll()
solution so speedy?
Greets!
Code snippets:
String.prototype.replaceAll = function(str1, str2, ignore)
{
return this.replace(new RegExp(str1.replace(/([\/\,\!\\\^\$\{\}\[\]\(\)\.\*\+\?\|\<\>\-\&])/g,"\\$&"),(ignore?"gi":"g")),(typeof(str2)=="string")?str2.replace(/\$/g,"$$$$"):str2);
};
credits for qwerty
Fastest case so far:
html.replaceAll('&', '&').replaceAll('"', '"').replaceAll("'", ''').replaceAll('<', '<').replaceAll('>', '>');
Share
Improve this question
edited May 23, 2017 at 10:27
CommunityBot
11 silver badge
asked Jul 3, 2013 at 7:28
SaramSaram
1,5101 gold badge18 silver badges35 bronze badges
11
- 2 Many built in methods are implemented in native code and pre-optimized (regexes being one), emulating them in Javascript in a speedier way is just plain hard to do. – Joachim Isaksson Commented Jul 3, 2013 at 7:32
- sure, but why "replace new RegExp" case is so slow. It uses RegExp too. – Saram Commented Jul 3, 2013 at 7:34
- still replace without regex seems to be faster jsperf./replaceallvssplitjoin – Mr_Green Commented Jul 3, 2013 at 7:54
- 1 @Mr_Green The multiple replace is wrong, because it only replaces the first occurrence :) – Ja͢ck Commented Jul 3, 2013 at 7:59
- 3 Always pile your regexes; jsperf./htmlencoderegex/32 – Joachim Isaksson Commented Jul 3, 2013 at 8:21
3 Answers
Reset to default 4Finally i found it! Thanks Jack for pointing me on jsperf specific
I should note that the test results are strange; when .replaceAll() is defined inside Benchmark.prototype.setup it runs twice as fast pared to when it's defined globally (i.e. inside a tag). I'm still not sure why that is, but it definitely must be related to how jsperf itself works.
The answer is:
replaceAll
- this reach jsperf limit/bug, caused by special sequence "\\$&"
, so results was wrong.
pile()
- when called with no argument it changes regexp definition to /(?:)
. I dont know if it is bug or something, but performance result was crappy after it was called.
Here is my result safe tests.
Finally I prepared proper test cases.
The result is, that for HTML escape best way it to use native DOM based solution, like:
document.createElement('div').appendChild(document.createTextNode(html)).parentNode.innerHTML
or if you repeat it many times you can do it with once prepared variables:
//prepare variables
var DOMtext = document.createTextNode("test");
var DOMnative = document.createElement("span");
DOMnative.appendChild(DOMtext);
//main work for each case
function HTMLescape(html){
DOMtext.nodeValue = html;
return DOMnative.innerHTML
}
Thank you all for collaboration & posting ments and directions.
jsperf bug description
The String.prototype.replaceAll
was defined as followed:
function (str1, str2, ignore) {
return this.replace(new RegExp(str1.replace(repAll, "\\#{setup}"), (ignore ? "gi" : "g")), (typeof(str2) == "string") ? str2.replace(/\$/g, "$$") : str2);
}
As far as performance goes, I find that the below function is as good as it gets:
String.prototype.htmlEscape = function() {
var amp_re = /&/g, sq_re = /'/g, quot_re = /"/g, lt_re = /</g, gt_re = />/g;
return function() {
return this
.replace(amp_re, '&')
.replace(sq_re, ''')
.replace(quot_re, '"')
.replace(lt_re, '<')
.replace(gt_re, '>');
}
}();
It initializes the regular expressions and returns a closure that actually performs the replacement.
Performance test
I should note that the test results are strange; when .replaceAll()
is defined inside Benchmark.prototype.setup
it runs twice as fast pared to when it's defined globally (i.e. inside a <script>
tag). I'm still not sure why that is, but it definitely must be related to how jsperf itself works.
Using RegExp.pile()
I wanted to avoid using a deprecated function, mostly because this kind of performance should be done automatically by modern browsers. Here's a version with piled expressions:
String.prototype.htmlEscape2 = function() {
var amp_re = /&/g, sq_re = /'/g, quot_re = /"/g, lt_re = /</g, gt_re = />/g;
if (RegExp.prototype.pile) {
amp_re.pile();
sq_re.pile();
quot_re.pile();
lt_re.pile();
gt_re.pile();
}
return function() {
return this
.replace(amp_re, '&')
.replace(sq_re, ''')
.replace(quot_re, '"')
.replace(lt_re, '<')
.replace(gt_re, '>');
}
}
Doing so blows everything else out of the water!
Performance test
The reason why .pile()
gives such a performance boost is because when you pile a global expression, e.g. /a/g
it gets converted to /(?:)/
(on Chrome), which renders it useless.
If pilation can't be done, a browser should throw an error instead of silently destroying it.
Actually there are faster ways to do this.
If you could do an inline split and join, you will get a better performance.
//example below
var test = "This is a test string";
var test2 = test.split("a").join("A");
Try this and run the performance test.