I have an html document consists of a <div id="main">
. Inside this div
may be several levels of nodes, without a precise structure because is the user who creates the document content.
I want to use a JavaScript function that returns all nodes within div id="main"
. Any tag is, taking into account that there may be different levels of children.
For example, if I has this document:
...
<div id="main">
<h1>bla bla</h1>
<p>
<b>fruits</b> apple<i>text</i>.
<img src="..">image</img>
</p>
<div>
<p></p>
<p></p>
</div>
<p>..</p>
</div>
...
The function getNodes
would return an array of object nodes (I don't know how to represent it, so I list them):
[h1, #text (= bla bla), p, b, #text (= fruits), #text (= _apple), i, #text (= text), img, #text (= image), div, p, p, p, #text (= ..)]
As we see from the example, you must return all nodes, even the leaf nodes (ie #text node).
For now I have this function that returns all nodes except leaf:
function getNodes() {
var all = document.querySelectorAll("#main *");
for (var elem = 0; elem < all.length; elem++) {
//do something..
}
}
In fact, this feature applied in the above example returns:
[H1, P, B, I, IMG, DIV, P, P, P]
There aren't #text nodes. Also, if text elements returned by that method in this way:
all[elem].children.length
I obtain that (I tested on <p>fruits</p>
) <p>
is a leaf node.
But if I build the DOM tree it is clear that is not a leaf node, and that in this example the leaf nodes are the #text
...
Thank you
I have an html document consists of a <div id="main">
. Inside this div
may be several levels of nodes, without a precise structure because is the user who creates the document content.
I want to use a JavaScript function that returns all nodes within div id="main"
. Any tag is, taking into account that there may be different levels of children.
For example, if I has this document:
...
<div id="main">
<h1>bla bla</h1>
<p>
<b>fruits</b> apple<i>text</i>.
<img src="..">image</img>
</p>
<div>
<p></p>
<p></p>
</div>
<p>..</p>
</div>
...
The function getNodes
would return an array of object nodes (I don't know how to represent it, so I list them):
[h1, #text (= bla bla), p, b, #text (= fruits), #text (= _apple), i, #text (= text), img, #text (= image), div, p, p, p, #text (= ..)]
As we see from the example, you must return all nodes, even the leaf nodes (ie #text node).
For now I have this function that returns all nodes except leaf:
function getNodes() {
var all = document.querySelectorAll("#main *");
for (var elem = 0; elem < all.length; elem++) {
//do something..
}
}
In fact, this feature applied in the above example returns:
[H1, P, B, I, IMG, DIV, P, P, P]
There aren't #text nodes. Also, if text elements returned by that method in this way:
all[elem].children.length
I obtain that (I tested on <p>fruits</p>
) <p>
is a leaf node.
But if I build the DOM tree it is clear that is not a leaf node, and that in this example the leaf nodes are the #text
...
Thank you
Share Improve this question asked Nov 4, 2015 at 18:31 user5346990user53469904 Answers
Reset to default 8Classic case for recursion into the DOM.
function getDescendants(node, accum) {
var i;
accum = accum || [];
for (i = 0; i < node.childNodes.length; i++) {
accum.push(node.childNodes[i])
getDescendants(node.childNodes[i], accum);
}
return accum;
}
and
getDescendants( document.querySelector("#main") );
Aside from the already existing and perfectly functional answer, I find it worth mentioning that one can do away with the recursion and the many resulting function calls by simply navigating via the firstChild
, nextSibling
, and parentNode
properties:
function getDescendants(node) {
var list = [], desc = node, checked = false, i = 0;
do {
checked || (list[i++] = desc);
desc =
(!checked && desc.firstChild) ||
(checked = false, desc.nextSibling) ||
(checked = true, desc.parentNode);
} while (desc !== node);
return list;
}
(Whenever we encounter a new node, we add it to the list, then try going to its first child node. If such does not exist, get the next sibling instead. Whenever no child node or following sibling is found, we go back up to the parent, while setting the checked
flag to avoid adding that to the list again or reentering its descendant tree.)
This will, in virtually every case, improve performance greatly. Not that there is nothing left to optimize here, e.g. one could cache the nodes where we descend further into the hierarchy so as to later get rid of the parentNode
when ing back up. I leave implementing this as an exercise for the reader.
Keep in mind though that iterating through the DOM like this will rarely be the bottleneck in a script. Unless you are going through a large DOM tree many tens/hundreds of times a second, that is — in which case you probably ought to think about avoiding that if at all possible, rather than simply optimizing it.
the children
property only returns element nodes. If you want all children, I would suggest using the childNodes
property. Then you can loop through this nodeList, and eliminate nodes that have nodeType of Node.ELEMENT_NODE
or pick which other node types you would be interested in
so try something like:
var i, j, nodes
var result=[]
var all = document.querySelectorAll("#main *");
for (var elem = 0; elem < all.length; elem++) {
result.push(all[elem].nodeName)
nodes = all[elem].childNodes;
for (i=0, j=nodes.length; i<j; i++) {
if (nodes[i].nodeType == Node.TEXT_NODE) {
result.push(nodes[i].nodeValue)
}
}
}
If you only need the html tags and not the #text
, you can just simply use this:<elem>.querySelectorAll("*");