Let's say I have a bunch of HTML like below:
bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>
Is there an easy way with Javascript to convert it to properly semantic <p>
tags? E.g.:
<p>
bla bla bla long paragraph here
</p>
<p>
bla bla bla more paragraph text
</p>
Output spacing is not important, ideally it will work with any input spacing.
I'm thinking I might try to cook up a regex, but before I do that I wanted to make sure I was a) avoiding a world of hurt and b) there wasn't something else out there - I'd tried to do a google search but haven't yet e up with anything.
Thanks for any advice!
Let's say I have a bunch of HTML like below:
bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>
Is there an easy way with Javascript to convert it to properly semantic <p>
tags? E.g.:
<p>
bla bla bla long paragraph here
</p>
<p>
bla bla bla more paragraph text
</p>
Output spacing is not important, ideally it will work with any input spacing.
I'm thinking I might try to cook up a regex, but before I do that I wanted to make sure I was a) avoiding a world of hurt and b) there wasn't something else out there - I'd tried to do a google search but haven't yet e up with anything.
Thanks for any advice!
Share Improve this question asked Aug 13, 2009 at 23:31 Rufo SanchezRufo Sanchez 6151 gold badge6 silver badges20 bronze badges 1- Damn. Awesome stuff - I figured I'd get a couple of pointers in the right direction - I certainly wasn't expecting two separate coded solutions. It'll be a day or two before I need to implement this, but I'll be sure to report back with what I've wound up doing. – Rufo Sanchez Commented Aug 14, 2009 at 6:09
4 Answers
Reset to default 7I got bored. I'm sure there are optimizations / tweaks needed. Uses a little bit of jQuery to do its magic. Worked in FF3. And the answer to your question is that there isnt a very "simple" way :)
$(function() {
$.fn.pmaker = function() {
var brs = 0;
var nodes = [];
function makeP()
{
// only bother doing this if we have nodes to stick into a P
if (nodes.length) {
var p = $("<p/>");
p.insertBefore(nodes[0]); // insert a new P before the content
p.append(nodes); // add the children
nodes = [];
}
brs=0;
}
this.contents().each(function() {
if (this.nodeType == 3) // text node
{
// if the text has non whitespace - reset the BR counter
if (/\S+/.test(this.data)) {
nodes.push(this);
brs = 0;
}
} else if (this.nodeType == 1) {
if (/br/i.test(this.tagName)) {
if (++brs == 2) {
$(this).remove(); // remove this BR from the dom
$(nodes.pop()).remove(); // delete the previous BR from the array and the DOM
makeP();
} else {
nodes.push(this);
}
} else if (/^(?:p)$/i.test(this.tagName)) {
// these tags for the P break but dont scan within
makeP();
} else if (/^(?:div)$/i.test(this.tagName)) {
// force a P break and scan within
makeP();
$(this).pmaker();
} else {
brs = 0; // some other tag - reset brs.
nodes.push(this); // add the node
// specific nodes to not peek inside of - inline tags
if (!(/^(?:b|i|strong|em|span|u)$/i.test(this.tagName))) {
$(this).pmaker(); // peek inside for P needs
}
}
}
});
while ((brs--)>0) { // remove any extra BR's at the end
$(nodes.pop()).remove();
}
makeP();
return this;
};
// run it against something:
$(function(){
$("#worker").pmaker();
});
And this was the html portion I tested against:
<div id="worker">
bla bla bla long <b>paragraph</b> here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>
this text should end up in a P
<div class='test'>
and so should this
<br/>
<br/>
and this<br/>without breaking at the single BR
</div>
and then we have the a "buggy" clause
<p>
fear the real P!
</p>
and a trailing br<br/>
</div>
And the result:
<div id="worker"><p>
bla bla bla long <b>paragraph</b> here
</p>
<p>
bla bla bla more paragraph text
</p>
<p>
this text should end up in a P
</p><div class="test"><p>
and so should this
</p>
<p>
and this<br/>without breaking at the single BR
</p></div><p>
and then we have the a "buggy" clause
</p><p>
fear the real P!
</p><p>
and a trailing br</p>
</div>
Scan each of the child elements + text of the enclosing element. Each time you encounter a "br" element, create a "p" element, and append all pending stuff to it. Lather, rinse, repeat.
Don't forget to remove the stuff which you are relocating to a new "p" element.
I have found this library (prototype.js) to be useful for this sort of thing.
I'm assuming you're not really allowing any other
Sometimes you need to preserve single line-breaks (not all <br />
elements are bad), and you only want to turn double instances of <br />
into paragraph breaks.
In doing so I would:
- Remove all line breaks
- Wrap the whole lot in a paragraph
- Replace
<br /><br />
with</p>\n<p>
- Lastly, remove any empty
<p></p>
elements that might have been generated
So the code could look something like:
var ConvertToParagraphs = function(text) {
var lineBreaksRemoved = text.replace(/\n/g, "");
var wrappedInParagraphs = "<p>" + lineBreaksRemoved + "</p>";
var brsRemoved = wrappedInParagraphs.replace(/<br[^>]*>[\s]*<br[^>]*>/gi, "</p>\n<p>");
var emptyParagraphsRemoved = brsRemoved.replace(/<p><\/p>/g, "");
return emptyParagraphsRemoved;
}
Note: I've been exceedingly verbose to show the processes, you'd simplify it of course.
This turns your sample:
bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>
Into:
<p>bla bla bla long paragraph here</p>
<p>bla bla bla more paragraph text</p>
But it does so without removing any <br />
elements that you may actually want.
I'd do it in several stages:
- RegExp: Convert all br-tags to line-breaks.
- RegExp: Strip out all the white-space.
- RegExp: Convert the multiple line-breaks to single ones.
- Use Array.split('\n') on the result.
That should give an array with all the 'real' paragraphs (in theory.) Then you can just iterate through it and wrap each line in p-tags.