Is there an easy way to convert HTML with multiple tags into proper surrounding tags in Jav

Let's say I have a bunch of HTML like below:

bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>

Is there an easy way with Javascript to convert it to properly semantic  tags? E.g.:

<p>
  bla bla bla long paragraph here
</p>
<p>
  bla bla bla more paragraph text
</p>

Output spacing is not important, ideally it will work with any input spacing.

I'm thinking I might try to cook up a regex, but before I do that I wanted to make sure I was a) avoiding a world of hurt and b) there wasn't something else out there - I'd tried to do a google search but haven't yet e up with anything.

Thanks for any advice!

Let's say I have a bunch of HTML like below:

bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>

Is there an easy way with Javascript to convert it to properly semantic  tags? E.g.:

<p>
  bla bla bla long paragraph here
</p>
<p>
  bla bla bla more paragraph text
</p>

Output spacing is not important, ideally it will work with any input spacing.

Thanks for any advice!

Share Improve this question asked Aug 13, 2009 at 23:31 Rufo Sanchez 6151 gold badge6 silver badges20 bronze badges

Damn. Awesome stuff - I figured I'd get a couple of pointers in the right direction - I certainly wasn't expecting two separate coded solutions. It'll be a day or two before I need to implement this, but I'll be sure to report back with what I've wound up doing. – Rufo Sanchez Commented Aug 14, 2009 at 6:09

Add a ment |

4 Answers 4

Sorted by: Reset to default 7

I got bored. I'm sure there are optimizations / tweaks needed. Uses a little bit of jQuery to do its magic. Worked in FF3. And the answer to your question is that there isnt a very "simple" way :)

$(function() {
  $.fn.pmaker = function() {
    var brs = 0;
    var nodes = [];

    function makeP()
    {
      // only bother doing this if we have nodes to stick into a P
      if (nodes.length) {
        var p = $("<p/>");
        p.insertBefore(nodes[0]);  // insert a new P before the content
        p.append(nodes); // add the children        
        nodes = [];
      }
      brs=0;
    }

    this.contents().each(function() {    
      if (this.nodeType == 3) // text node 
      {
        // if the text has non whitespace - reset the BR counter
        if (/\S+/.test(this.data)) {
          nodes.push(this);
          brs = 0;
        }
      } else if (this.nodeType == 1) {
        if (/br/i.test(this.tagName)) {
          if (++brs == 2) {
            $(this).remove(); // remove this BR from the dom
            $(nodes.pop()).remove(); // delete the previous BR from the array and the DOM
            makeP();
          } else {
            nodes.push(this);
          }
        } else if (/^(?:p)$/i.test(this.tagName)) {
          // these tags for the P break but dont scan within
          makeP();
        } else if (/^(?:div)$/i.test(this.tagName)) {
          // force a P break and scan within
          makeP();
          $(this).pmaker();
        } else {
          brs = 0; // some other tag - reset brs.
          nodes.push(this); // add the node 
          // specific nodes to not peek inside of - inline tags
          if (!(/^(?:b|i|strong|em|span|u)$/i.test(this.tagName))) {
            $(this).pmaker(); // peek inside for P needs            
          }
        } 
      } 
    });
    while ((brs--)>0) { // remove any extra BR's at the end
      $(nodes.pop()).remove();
    }
    makeP();
    return this;
  };

  // run it against something:
  $(function(){ 
    $("#worker").pmaker();
  });

And this was the html portion I tested against:

<div id="worker">
bla bla bla long <b>paragraph</b> here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>
this text should end up in a P
<div class='test'>
  and so should this
  <br/>
  <br/>
  and this<br/>without breaking at the single BR
</div>
and then we have the a "buggy" clause
<p>
  fear the real P!
</p>
and a trailing br<br/>
</div>

And the result:

<div id="worker"><p>
bla bla bla long <b>paragraph</b> here
</p>
<p>
bla bla bla more paragraph text
</p>
<p>
this text should end up in a P
</p><div class="test"><p>
  and so should this
  </p>
  <p>
  and this<br/>without breaking at the single BR
</p></div><p>
and then we have the a "buggy" clause
</p><p>
  fear the real P!
</p><p>
and a trailing br</p>
</div>

Scan each of the child elements + text of the enclosing element. Each time you encounter a "br" element, create a "p" element, and append all pending stuff to it. Lather, rinse, repeat.

Don't forget to remove the stuff which you are relocating to a new "p" element.

I have found this library (prototype.js) to be useful for this sort of thing.

I'm assuming you're not really allowing any other Sometimes you need to preserve single line-breaks (not all   elements are bad), and you only want to turn double instances of   into paragraph breaks.

In doing so I would:

Remove all line breaks
Wrap the whole lot in a paragraph
Replace   with \n
Lastly, remove any empty  elements that might have been generated

So the code could look something like:

var ConvertToParagraphs = function(text) {
    var lineBreaksRemoved = text.replace(/\n/g, "");
    var wrappedInParagraphs = "<p>" + lineBreaksRemoved + "</p>";
    var brsRemoved = wrappedInParagraphs.replace(/<br[^>]*>[\s]*<br[^>]*>/gi, "</p>\n<p>");
    var emptyParagraphsRemoved = brsRemoved.replace(/<p><\/p>/g, "");
    return emptyParagraphsRemoved;
}

Note: I've been exceedingly verbose to show the processes, you'd simplify it of course.

This turns your sample:

bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>

Into:

<p>bla bla bla long paragraph here</p>
<p>bla bla bla more paragraph text</p>

But it does so without removing any   elements that you may actually want.

I'd do it in several stages:

RegExp: Convert all br-tags to line-breaks.
RegExp: Strip out all the white-space.
RegExp: Convert the multiple line-breaks to single ones.
Use Array.split('\n') on the result.

That should give an array with all the 'real' paragraphs (in theory.) Then you can just iterate through it and wrap each line in p-tags.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Is there an easy way to convert HTML with multiple <br> tags into proper surrounding <p> tags in Jav

4 Answers 4

与本文相关的文章

评论列表(0)