最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Checking whether an HTML element contains primitive text? - Stack Overflow

programmeradmin6浏览0评论

Take this HTML:

<div id="el1">
  <div id="el2">
    <div id="el3">
      Hello
      <div id="el4">
        World
      </div>
    </div>
  </div>
</div>

Note that el3 and el4 contain primitive text; namely "Hello" and "World". The other elements (el1 and el2) only contain other elements.

And yet, using pure JavaScript, all of their innerHTML properties indicate they contain some form of text.

How can one use pure JavaScript to ascertain whether a particular element contains primitive text as a child. In this instance, the method would also recognise el3 as containing primitive text (even though it also contains another element thereafter).

Something like this:

var els = getElementByTagName("*");

for(var i = 0; i < els.length; i++){

  if( /* element contains text */ ){

    // do something

  }
}

Is this really just a job for RegEx? With all the properties of an HTMLElement, you'd think there would be a better way.

No jQuery, thanks.

Take this HTML:

<div id="el1">
  <div id="el2">
    <div id="el3">
      Hello
      <div id="el4">
        World
      </div>
    </div>
  </div>
</div>

Note that el3 and el4 contain primitive text; namely "Hello" and "World". The other elements (el1 and el2) only contain other elements.

And yet, using pure JavaScript, all of their innerHTML properties indicate they contain some form of text.

How can one use pure JavaScript to ascertain whether a particular element contains primitive text as a child. In this instance, the method would also recognise el3 as containing primitive text (even though it also contains another element thereafter).

Something like this:

var els = getElementByTagName("*");

for(var i = 0; i < els.length; i++){

  if( /* element contains text */ ){

    // do something

  }
}

Is this really just a job for RegEx? With all the properties of an HTMLElement, you'd think there would be a better way.

No jQuery, thanks.

Share Improve this question edited Feb 1, 2015 at 22:03 Deduplicator 45.8k7 gold badges72 silver badges123 bronze badges asked Jan 17, 2014 at 17:54 shennanshennan 11.7k5 gold badges50 silver badges92 bronze badges 2
  • Define “primitive text”. All the elements in the example contain text nodes. Instead of using an invented expression like “primitive text” without a definition, you should define the test you wish to perform. Perhaps you wish to test whether an element contains text nodes that have content other than whitespace characters? Then you would just need to define which characters should be treated as whitespace characters, and the rest is simple coding. – Jukka K. Korpela Commented Jan 17, 2014 at 19:09
  • 1 You're a bit late to the party. Considering three answers, all petent, have been put forward suggests that most people understood my "invented expression". If I had known the ins and outs of the DOM, then perhaps I would have been able to articulate "text nodes with content other than whitespace". But, I decided to try and describe what I was after and hope that someone more imaginative might be able to understand what I meant. Thankfully, they did. – shennan Commented Jan 17, 2014 at 19:16
Add a ment  | 

3 Answers 3

Reset to default 2

innerHTML gets the HTML, and all of the elements except the last one contains HTML as they are nested.

For instance, the innerHTML of #el2 would be

  <div id="el3">
      Hello
      <div id="el4">
          World
      </div>
  </div>

To get just the text, modern browsers support either innerText or textContent (firefox).
Then there's whitespace, so you should probably trim() the text as well, so something like this

var els = document.querySelectorAll("#wrapper *");

for(var i = 0; i < els.length; i++){
    var el = els[i].cloneNode(true);
    var children = el.children;

    for (var j=children.length; j--;) el.removeChild(children[j]);
    var content = el.innerText ? el.innerText  : el.textContent;

    if( content.trim().length ){
        // do something
        console.log(els[i].getAttribute('id') + ' has text');
    }
}

FIDDLE

Or checking the nodeType and nodeValue of text nodes

var els = document.querySelectorAll("#wrapper *");

for(var i = 0; i < els.length; i++){
    var el = els[i];
    var children = el.childNodes;

    for (var j=children.length; j--;) {
        if( children[j].nodeType === 3 && children[j].nodeValue.trim().length) {
            // do something
            console.log(els[i].getAttribute('id') + ' has text');
        }
    }
}

FIDDLE

Here's an example of how you can use the nodeType to help you get your answer:

var els = document.getElementsByTagName("*");

for (var i = 0; i < els.length; i++) {
    var hasTextNode = false;
    var currChildren = els[i].childNodes;

    for (var j = 0; j < currChildren.length; j++) {
        if ((currChildren[j].nodeType === Node.TEXT_NODE) &&
            (!(/^\s*$/.test(currChildren[j].textContent)))) {
                hasTextNode = true;
                break;
        }
    }

    window.console.log(els[i].id + ((hasTextNode) ? " has" : " does not have") + " a Text Node");
}

Applying that to the HTML that you provided results in this in the console:

el1 does not have a Text Node
el2 does not have a Text Node
el3 has a Text Node
el4 has a Text Node

Note: it is important to check the found Text Nodes for "space only" content, because the DOM will consider all of the indenting and line breaks in the source code as a "Text Node". Obviously, you would want to ignore those.

you tell the difference between element nodes and text nodes via the nodeType property. myelementnode.nodeType will return 1, mytextnode.nodeType will return 3.

as the name suggests, getElementsByTagName will only give you element nodes. what you want to do is use the childNodes property of your root node, which will get you all immediate children of that node as a nodelist. so, for el1 you will get just the one child node, el2.

you then have to recursively go through each child node to get its children until you hit a node with type 3 - text.

so for el3, it will return 2 child nodes. The first will be your text, the second will be your el4 element. You'd then need to go into el4 to get its child node.

innerHTML returns a string (of a chunk of html converted to a string), not nodes. you could use that and a regular expression to discard everything that sits within < and >, but that is a bit crude, and with large chunks of html will be an expensive process.

发布评论

评论列表(0)

  1. 暂无评论