最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

html - Javascript find all text except those in <a> tag - Stack Overflow

programmeradmin1浏览0评论

I have a div, and this div can (or cannot) have html elements as children. With my javascript, i need to find all the occurrences of a word inside this div, except for those in the <a> tag.

For example:

<div id="dictionable">
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
    <br/><br/>
    <a href="#lorem">lorem</a>
    <br/><br/>
    <p>lorem</p>
</div>

I tried with my ultra low capabilities to build a regex, failing miserably. So I googled and found this:

var pattern = new RegExp('(lorem)(?![^<]*>|[^<>]*</)', 'gim');

this regex finds every occurrence of "lorem" but not in EVERY tag. I just need to exclude only the A tag.

Could anyone help me?

I have a div, and this div can (or cannot) have html elements as children. With my javascript, i need to find all the occurrences of a word inside this div, except for those in the <a> tag.

For example:

<div id="dictionable">
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
    <br/><br/>
    <a href="#lorem">lorem</a>
    <br/><br/>
    <p>lorem</p>
</div>

I tried with my ultra low capabilities to build a regex, failing miserably. So I googled and found this:

var pattern = new RegExp('(lorem)(?![^<]*>|[^<>]*</)', 'gim');

this regex finds every occurrence of "lorem" but not in EVERY tag. I just need to exclude only the A tag.

Could anyone help me?

Share Improve this question edited Dec 16, 2014 at 10:59 JLRishe 102k19 gold badges137 silver badges171 bronze badges asked Dec 16, 2014 at 10:56 ValerioValerio 3,6274 gold badges30 silver badges54 bronze badges 4
  • 3 don't parse html with regex blog.codinghorror./parsing-html-the-cthulhu-way – AlexanderBrevig Commented Dec 16, 2014 at 10:59
  • 1 Is jQuery an option? – JLRishe Commented Dec 16, 2014 at 11:00
  • ok for jquery. I love Jeff Atwood's humor :D – Valerio Commented Dec 16, 2014 at 11:03
  • jQuery or plain JS, the point is the same - regex is not suitable for querying the DOM. There are built-in functions that allow to traverse the DOM safely and accurately, as in Niet's answer. – Boaz Commented Dec 16, 2014 at 11:05
Add a ment  | 

3 Answers 3

Reset to default 7

No regex. Absolutely no regex. Nuh-uh. Nope.

var copy = document.getElementById('dictionable').cloneNode(true),
    links = copy.getElementsByTagName('a'), l = links.length, i;
for( i=l-1; i>=0; i--) {
    // always work in reverse order when deleting stuff, it's safer!
    links[i].parentNode.removeChild(links[i]);
}

var result = copy.textContent || copy.innerText;

Boom!

Using jquery its too simple

var $dictionable = $("#dictionable").clone();
$dictionable.find('a').remove();//This will remove all <a> tag
$dictionable.text();//This will give all text

Since everything in an element is considered as an element by itself, you can simply iterate through the div's children.

Granted, it's not the shortest solution due to its validation, but it should be relatively fast.

var d = document.getElementById('dictionable');
var textcontent = '';
for (node in d.childNodes) {
    // accept only element (1), text (3) and non-link element
    if ((d.childNodes[node].nodeType != 1 && 
        d.childNodes[node].nodeType != 3) || 
        d.childNodes[node].nodeName == 'A')
        continue;

    textcontent = textcontent+d.childNodes[node].textContent
}

This was you can even set the search inside the loop and narrow the results down to a single element level.

发布评论

评论列表(0)

  1. 暂无评论