Say I have
var string =
"<h1>Header</h1>
<p>this is a small paragraph</p>
<ul>
<li>list element 1.</li>
<li>list element 2.</li>
<li>list element 3. With a small update.</li>
</ul>"
//newlines for clarity only
How can I split this string, using javascript so that I get
var array = string.split(/*...something here*/)
array = [
"<h1>Header</h1>",
"<p>this is a small paragraph</p>",
"<ul><li>list element 1.</li><li>list element 2.</li><li>list element 3. With a small update.</li></ul>"
]
I only want to split the top html elements, not the children.
Say I have
var string =
"<h1>Header</h1>
<p>this is a small paragraph</p>
<ul>
<li>list element 1.</li>
<li>list element 2.</li>
<li>list element 3. With a small update.</li>
</ul>"
//newlines for clarity only
How can I split this string, using javascript so that I get
var array = string.split(/*...something here*/)
array = [
"<h1>Header</h1>",
"<p>this is a small paragraph</p>",
"<ul><li>list element 1.</li><li>list element 2.</li><li>list element 3. With a small update.</li></ul>"
]
I only want to split the top html elements, not the children.
Share Improve this question edited Apr 18, 2013 at 20:09 Alex Shesterov 27.6k13 gold badges88 silver badges108 bronze badges asked Apr 18, 2013 at 19:48 Eoin MurrayEoin Murray 1,9553 gold badges23 silver badges34 bronze badges3 Answers
Reset to default 3You could do something like this:
var string = '<div><p></p></div><h1></h1>';
var elements = $(string).map(function() {
return $('<div>').append(this).html(); // Basically `.outerHTML()`
});
And the result:
["<h1>Header</h1>", "<p>this is a small paragraph</p>", "<ul> <li>list element 1.</li> <li>list element 2.</li> <li>list element 3. With a small update.</li></ul>"]
A performant solution ( http://jsperf./spliting-html ):
var splitter = document.createElement('div'),
text = splitter.innerHTML = "<h1>Header</h1>\
<p>this is a small paragraph</p>\
<ul>\
<li>list element 1.</li>\
<li>list element 2.</li>\
<li>list element 3. With a small update.</li>\
</ul>",
parts = splitter.children,
part = parts[0].innerHTML;
You can't do this with regular expressions. Your regular expression will fail if you have several nested elements of the same type, e.g.
<div>
<div>
<div>
</div>
</div>
</div>
This is due to the fact that regular expressions can only process regular languages, and HTML is a real context-free language (and context-free is "more plex" than regular).
See also: https://stackoverflow./a/1732454/2170192
But if you don't have nested elements of the same type, you may split your html-string by taking all matches returned by the following regular expression (which uses backlinks):
/<(\w+).*<\/\1\s*>/igsm
<(\w+)
matches less-than-sign and several word-characters (letters, digits, underscores), while capturing the word-characters via parentheses (first capturing group)..*
matches contents of the element.<\/
matches opening of the end-tag.\1
is the backreference which matches exactly the sequence of symbols captured via the first capturing group.\s*>
matches optional whitespace and the greater-than sign.igsm
are modifiers: case-insensitive, global, dot-matches-all-symbols and multi-line.