javascript - Replace words in a string, but ignore HTML

I'm trying to write a highlight plugin, and would like to preserve HTML formatting. Is it possible to ignore all the characters between < and > in a string when doing a replace using javascript?

Using the following as an example:

var string = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

I would like to be able to achieve the following (replace 'dolor' with 'FOO'):

var string = "Lorem ipsum FOO span sit amet, consectetuer <span class='dolor'>FOO</span> adipiscing elit.";

Or perhaps even this (replace 'span' with 'BAR'):

var string = "Lorem ipsum dolor BAR sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

I came very close to finding an answer given by tambler here: Can you ignore HTML in a string while doing a Replace with jQuery? but, for some reason, I just can't get the accepted answer to work.

I'm pletely new to regex, so any help would be gratefully appreciated.

I'm trying to write a highlight plugin, and would like to preserve HTML formatting. Is it possible to ignore all the characters between < and > in a string when doing a replace using javascript?

Using the following as an example:

var string = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

I would like to be able to achieve the following (replace 'dolor' with 'FOO'):

var string = "Lorem ipsum FOO span sit amet, consectetuer <span class='dolor'>FOO</span> adipiscing elit.";

Or perhaps even this (replace 'span' with 'BAR'):

var string = "Lorem ipsum dolor BAR sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

I came very close to finding an answer given by tambler here: Can you ignore HTML in a string while doing a Replace with jQuery? but, for some reason, I just can't get the accepted answer to work.

I'm pletely new to regex, so any help would be gratefully appreciated.

Share Improve this question edited May 23, 2017 at 10:33 CommunityBot 11 silver badge asked Dec 14, 2011 at 10:42 Jon 1192 silver badges8 bronze badges

1 stackoverflow./questions/2289552/… – ggzone Commented Dec 14, 2011 at 10:47
Jon, trying to parse html with regex is notoriously difficult. stackoverflow./questions/1732348/… – graphicdivine Commented Dec 14, 2011 at 10:48
2 You should parse the HTML and then iterate recursively over each text node. – Felix Kling Commented Dec 14, 2011 at 10:50
@graphicdivine he's not trying to parse it, he's just trying to change a word without modifying anything within elements – Prisoner Commented Dec 14, 2011 at 10:50
2 " Is it possible to ignore all the characters between < and > in a string" - What if the string contains something like "No html tags here even though 4 < 5 Lorem ipsum dolor span 5 > 4." – nnnnnn Commented Dec 14, 2011 at 11:17

Add a ment |

3 Answers 3

Sorted by: Reset to default 6

Parsing the HTML using the browser's built-in parser via innerHTML followed by DOM traversal is the sensible way to do this. Here's an answer loosely based on this answer:

Live demo: http://jsfiddle/FwGuq/1/

Code:

// Reusable generic function
function traverseElement(el, regex, textReplacerFunc) {
    // script and style elements are left alone
    if (!/^(script|style)$/.test(el.tagName)) {
        var child = el.lastChild;
        while (child) {
            if (child.nodeType == 1) {
                traverseElement(child, regex, textReplacerFunc);
            } else if (child.nodeType == 3) {
                textReplacerFunc(child, regex);
            }
            child = child.previousSibling;
        }
    }
}

// This function does the replacing for every matched piece of text
// and can be customized to do what you like
function textReplacerFunc(textNode, regex, text) {
    textNode.data = textNode.data.replace(regex, "FOO");
}

// The main function
function replaceWords(html, words) {
    var container = document.createElement("div");
    container.innerHTML = html;

    // Replace the words one at a time to ensure each one gets matched
    for (var i = 0, len = words.length; i < len; ++i) {
        traverseElement(container, new RegExp(words[i], "g"), textReplacerFunc);
    }
    return container.innerHTML;
}


var html = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";
alert( replaceWords(html, ["dolor"]) );

This solution works with perl, and should also work with Javascript since it is patible with ECMA 262:

s,\bdolor\b(?=[^"'][^>]*>),FOO,g

Basically, replace if the word is followed by everything which is not a quote, followed by everything which is not the closing > and the closing > itself.

Tim Down delivered a cool function. If you want the replace-text to contain HTML then simply use this small change. The regex has to contain "()" for $1 to work for example: let regex = new RegExp('(' + textToReplace + ')', 'gi');

const textReplacerFunc = function(textNode, regex) {
    textNode.parentNode.innerHTML = textNode.data.replace(regex, '<span class="highlight">$1</span>');
};

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Replace words in a string, but ignore HTML - Stack Overflow

3 Answers 3

与本文相关的文章

评论列表(0)