最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sanitization - Strip tags with javascript and handle line breaks - Stack Overflow

programmeradmin2浏览0评论

I want to strip tags from a html, but preserves it's line breaks.

I want the behaviour like copying the text in browser and pasting it in notepad.

For example, a code that converts:

  • <div>x1</div><div>x2</div> to x1\nx2
  • <p>x1</p><p>x2</p> to x1\nx2
  • <b>x1</b><i>x2</i> to x1x2
  • x1<br>x2 to x1\nx2

Removing all tags not works (/<.*?>/g). Also creating a dummy <div> and settings it's innertHTML and read it's textContent will remove line breaks.

Any Help?

I want to strip tags from a html, but preserves it's line breaks.

I want the behaviour like copying the text in browser and pasting it in notepad.

For example, a code that converts:

  • <div>x1</div><div>x2</div> to x1\nx2
  • <p>x1</p><p>x2</p> to x1\nx2
  • <b>x1</b><i>x2</i> to x1x2
  • x1<br>x2 to x1\nx2

Removing all tags not works (/<.*?>/g). Also creating a dummy <div> and settings it's innertHTML and read it's textContent will remove line breaks.

Any Help?

Share Improve this question edited Apr 14, 2012 at 14:35 Taha Jahangir asked Jul 27, 2011 at 16:02 Taha JahangirTaha Jahangir 4,9022 gold badges43 silver badges52 bronze badges
Add a ment  | 

4 Answers 4

Reset to default 3

How's this work for you? This will replace every occurrence of <br>, </div>, and </p> with a \n, and then strip the remaining tags. Its goofy, but its at least a start.

fixed = text_to_fix.replace(/<(?:br|\/div|\/p)>/g, "\n")
           .replace(/<.*?>/g, "");

This doesn't work for all HTML, however. Just the tags you mentioned.

Try:

function strip_tags(str){
    return str
             .replace(/(<(br[^>]*)>)/ig, '\n')
             .replace(/(<([^>]+)>)/ig,'');
}

var str = '<div>x1</div><div>x2</div><br>'+'<p>x1</p><p>x2</p>'+'<b>x1</b><i>x2</i>';

This will strip the tags and replace <br /> or <br> with new lines, but adding new lines for block elements requires quite some time to e up with a solution.

Here is a demo

This is as far as I got before I got bored...

const strip_tags = (html) => {
    let tmp = document.createElement("div");
    tmp.innerHTML = html
        .replace(/(<(br[^>]*)>)/ig, '\n')
        .replace(/(<(p[^>]*)>)/ig, '\n')
        .replace(/(<(div[^>]*)>)/ig, '\n')
        .replace(/(<(h[1-6][^>]*)>)/ig, '\n')
        .replace(/(<(li[^>]*)>)/ig, '\n')
        .replace(/(<(ul[^>]*)>)/ig, '\n')
        .replace(/(<(ol[^>]*)>)/ig, '\n')
        .replace(/(<(blockquote[^>]*)>)/ig, '\n')
        .replace(/(<(pre[^>]*)>)/ig, '\n')
        .replace(/(<(hr[^>]*)>)/ig, '\n')
        .replace(/(<(table[^>]*)>)/ig, '\n')
        .replace(/(<(tr[^>]*)>)/ig, '\n')
        .replace(/(<(td[^>]*)>)/ig, '\n')
        .replace(/(<(th[^>]*)>)/ig, '\n')
        .replace(/(<(caption[^>]*)>)/ig, '\n')
        .replace(/(<(dl[^>]*)>)/ig, '\n')
        .replace(/(<(dt[^>]*)>)/ig, '\n')
        .replace(/(<(dd[^>]*)>)/ig, '\n')
        .replace(/(<(address[^>]*)>)/ig, '\n')
        .replace(/(<(section[^>]*)>)/ig, '\n')
        .replace(/(<(article[^>]*)>)/ig, '\n')
        .replace(/(<(aside[^>]*)>)/ig, '\n');
    return tmp.textContent || tmp.innerText || "";
}

You can use this

function stripTags(html) {
     return html.replace(/<[^>]+>/g, '').replace(/<\/[^>]+>/g, '\n').replace(/<br>/g, '\n');
}

Now the function will replace all opening and closing tags with nothing, and <br> tags with line breaks. This should give you the desired output.

发布评论

评论列表(0)

  1. 暂无评论