html - Javascript find all text except those in <a> tag

I have a div, and this div can (or cannot) have html elements as children. With my javascript, i need to find all the occurrences of a word inside this div, except for those in the <a> tag.

For example:

<div id="dictionable">
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
    <br/><br/>
    <a href="#lorem">lorem</a>
    <br/><br/>
    <p>lorem</p>
</div>

I tried with my ultra low capabilities to build a regex, failing miserably. So I googled and found this:

var pattern = new RegExp('(lorem)(?![^<]*>|[^<>]*</)', 'gim');

this regex finds every occurrence of "lorem" but not in EVERY tag. I just need to exclude only the A tag.

Could anyone help me?

I have a div, and this div can (or cannot) have html elements as children. With my javascript, i need to find all the occurrences of a word inside this div, except for those in the <a> tag.

For example:

<div id="dictionable">
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
    <br/><br/>
    <a href="#lorem">lorem</a>
    <br/><br/>
    <p>lorem</p>
</div>

I tried with my ultra low capabilities to build a regex, failing miserably. So I googled and found this:

var pattern = new RegExp('(lorem)(?![^<]*>|[^<>]*</)', 'gim');

this regex finds every occurrence of "lorem" but not in EVERY tag. I just need to exclude only the A tag.

Could anyone help me?

Share Improve this question edited Dec 16, 2014 at 10:59 JLRishe 102k19 gold badges137 silver badges171 bronze badges asked Dec 16, 2014 at 10:56 Valerio 3,6274 gold badges30 silver badges54 bronze badges

3 don't parse html with regex blog.codinghorror./parsing-html-the-cthulhu-way – AlexanderBrevig Commented Dec 16, 2014 at 10:59
1 Is jQuery an option? – JLRishe Commented Dec 16, 2014 at 11:00
ok for jquery. I love Jeff Atwood's humor :D – Valerio Commented Dec 16, 2014 at 11:03
jQuery or plain JS, the point is the same - regex is not suitable for querying the DOM. There are built-in functions that allow to traverse the DOM safely and accurately, as in Niet's answer. – Boaz Commented Dec 16, 2014 at 11:05

Add a ment |

3 Answers 3

Sorted by: Reset to default 7

No regex. Absolutely no regex. Nuh-uh. Nope.

var copy = document.getElementById('dictionable').cloneNode(true),
    links = copy.getElementsByTagName('a'), l = links.length, i;
for( i=l-1; i>=0; i--) {
    // always work in reverse order when deleting stuff, it's safer!
    links[i].parentNode.removeChild(links[i]);
}

var result = copy.textContent || copy.innerText;

Boom!

Using jquery its too simple

var $dictionable = $("#dictionable").clone();
$dictionable.find('a').remove();//This will remove all <a> tag
$dictionable.text();//This will give all text

Since everything in an element is considered as an element by itself, you can simply iterate through the div's children.

Granted, it's not the shortest solution due to its validation, but it should be relatively fast.

var d = document.getElementById('dictionable');
var textcontent = '';
for (node in d.childNodes) {
    // accept only element (1), text (3) and non-link element
    if ((d.childNodes[node].nodeType != 1 && 
        d.childNodes[node].nodeType != 3) || 
        d.childNodes[node].nodeName == 'A')
        continue;

    textcontent = textcontent+d.childNodes[node].textContent
}

This was you can even set the search inside the loop and narrow the results down to a single element level.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

html - Javascript find all text except those in <a> tag - Stack Overflow

3 Answers 3

与本文相关的文章

评论列表(0)