javascript - Get all the descendant nodes (also the leaves) of a certain node

I have an html document consists of a <div id="main">. Inside this div may be several levels of nodes, without a precise structure because is the user who creates the document content. I want to use a JavaScript function that returns all nodes within div id="main". Any tag is, taking into account that there may be different levels of children.

For example, if I has this document:

...

<div id="main">

    <h1>bla bla</h1>

    <p>
        <b>fruits</b> apple<i>text</i>.
        <img src="..">image</img>
    </p>

    <div>
        <p></p>
        <p></p>
    </div>

    <p>..</p>

</div>
...

The function getNodes would return an array of object nodes (I don't know how to represent it, so I list them):

[h1, #text (= bla bla), p, b, #text (= fruits), #text (= _apple), i, #text (= text), img, #text (= image), div, p, p, p, #text (= ..)]

As we see from the example, you must return all nodes, even the leaf nodes (ie #text node).

For now I have this function that returns all nodes except leaf:

function getNodes() {
    var all = document.querySelectorAll("#main *");
    for (var elem = 0; elem < all.length; elem++) {
        //do something..
    }
}

In fact, this feature applied in the above example returns:

[H1, P, B, I, IMG, DIV, P, P, P]

There aren't #text nodes. Also, if text elements returned by that method in this way:

all[elem].children.length

I obtain that (I tested on fruits)  is a leaf node. But if I build the DOM tree it is clear that is not a leaf node, and that in this example the leaf nodes are the #text...

Thank you

For example, if I has this document:

...

<div id="main">

    <h1>bla bla</h1>

    <p>
        <b>fruits</b> apple<i>text</i>.
        <img src="..">image</img>
    </p>

    <div>
        <p></p>
        <p></p>
    </div>

    <p>..</p>

</div>
...

The function getNodes would return an array of object nodes (I don't know how to represent it, so I list them):

[h1, #text (= bla bla), p, b, #text (= fruits), #text (= _apple), i, #text (= text), img, #text (= image), div, p, p, p, #text (= ..)]

As we see from the example, you must return all nodes, even the leaf nodes (ie #text node).

For now I have this function that returns all nodes except leaf:

function getNodes() {
    var all = document.querySelectorAll("#main *");
    for (var elem = 0; elem < all.length; elem++) {
        //do something..
    }
}

In fact, this feature applied in the above example returns:

[H1, P, B, I, IMG, DIV, P, P, P]

There aren't #text nodes. Also, if text elements returned by that method in this way:

all[elem].children.length

I obtain that (I tested on fruits)  is a leaf node. But if I build the DOM tree it is clear that is not a leaf node, and that in this example the leaf nodes are the #text...

Thank you

Share Improve this question asked Nov 4, 2015 at 18:31 user5346990

Add a ment |

4 Answers 4

Sorted by: Reset to default 8

Classic case for recursion into the DOM.

function getDescendants(node, accum) {
    var i;
    accum = accum || [];
    for (i = 0; i < node.childNodes.length; i++) {
        accum.push(node.childNodes[i])
        getDescendants(node.childNodes[i], accum);
    }
    return accum;
}

and

getDescendants( document.querySelector("#main") );

Aside from the already existing and perfectly functional answer, I find it worth mentioning that one can do away with the recursion and the many resulting function calls by simply navigating via the firstChild, nextSibling, and parentNode properties:

function getDescendants(node) {
    var list = [], desc = node, checked = false, i = 0;
    do {
        checked || (list[i++] = desc);
        desc =
            (!checked && desc.firstChild) ||
            (checked = false, desc.nextSibling) ||
            (checked = true, desc.parentNode);
    } while (desc !== node);
    return list;
}

(Whenever we encounter a new node, we add it to the list, then try going to its first child node. If such does not exist, get the next sibling instead. Whenever no child node or following sibling is found, we go back up to the parent, while setting the checked flag to avoid adding that to the list again or reentering its descendant tree.)

This will, in virtually every case, improve performance greatly. Not that there is nothing left to optimize here, e.g. one could cache the nodes where we descend further into the hierarchy so as to later get rid of the parentNode when ing back up. I leave implementing this as an exercise for the reader.

Keep in mind though that iterating through the DOM like this will rarely be the bottleneck in a script. Unless you are going through a large DOM tree many tens/hundreds of times a second, that is — in which case you probably ought to think about avoiding that if at all possible, rather than simply optimizing it.

the children property only returns element nodes. If you want all children, I would suggest using the childNodes property. Then you can loop through this nodeList, and eliminate nodes that have nodeType of Node.ELEMENT_NODE or pick which other node types you would be interested in

so try something like:

var i, j, nodes
var result=[] 
var all = document.querySelectorAll("#main *");
for (var elem = 0; elem < all.length; elem++) {
    result.push(all[elem].nodeName)

    nodes = all[elem].childNodes;
    for (i=0, j=nodes.length; i<j; i++) {
        if (nodes[i].nodeType == Node.TEXT_NODE) {
            result.push(nodes[i].nodeValue)
        }
    }
}

If you only need the html tags and not the #text, you can just simply use this:<elem>.querySelectorAll("*");

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Get all the descendant nodes (also the leaves) of a certain node - Stack Overflow

4 Answers 4

与本文相关的文章

评论列表(0)