最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Get text from DOM excluding script tags - Stack Overflow

programmeradmin2浏览0评论

I want to get the TEXT ONLY from the following HTML document without the contents of the <script> tag?

<html>
  <body>
    <script>
      a = 0;
    </script>
   <div>TEST</div>
   <p>test</p>
  </body>
</html>

I have the following code:

$('body').text()

This currently gets the result:

a = 0; TEST test

But I am trying to get the result:

TEST test

I want to get the TEXT ONLY from the following HTML document without the contents of the <script> tag?

<html>
  <body>
    <script>
      a = 0;
    </script>
   <div>TEST</div>
   <p>test</p>
  </body>
</html>

I have the following code:

$('body').text()

This currently gets the result:

a = 0; TEST test

But I am trying to get the result:

TEST test
Share Improve this question edited Sep 28, 2017 at 17:38 Mr. Alien 158k36 gold badges303 silver badges285 bronze badges asked Sep 28, 2017 at 13:40 Caleb ParkCaleb Park 312 bronze badges 3
  • 2 I have no idea what you are trying to explain here – Mr. Alien Commented Sep 28, 2017 at 13:42
  • I edited quite a lot but I think it clears up your question, feel free to edit it if I got anything wrong – musefan Commented Sep 28, 2017 at 13:44
  • You could remove all the scripts before hand... they are all loaded into memory already. The only potential problem is if any code uses some for templates or other similar use – Patrick Evans Commented Sep 28, 2017 at 13:45
Add a ment  | 

4 Answers 4

Reset to default 3

Ok, so as you edited your question. If you are looking to extract the text from the page but not script tags, you can write something like

let cloneBody = $('body').clone().find('script').remove().end();
                
console.log(cloneBody.text().trim());
<script src="https://ajax.googleapis./ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script>
  var a = 1;
</script>
<p>Hello World</p>
<div>This is a test run</div>

You can do this using javascript as shown in a previous answer: Removing all script tags from html with JS Regular Expression

function stripScripts(s) {
    var div = document.createElement('div');
    div.innerHTML = s;
    var scripts = div.getElementsByTagName('script');
    var i = scripts.length;
    while (i--) {
      scripts[i].parentNode.removeChild(scripts[i]);
    }
    return div.innerHTML;
  }

alert(
 stripScripts('<span><script type="text/javascript">alert(\'foo\');<\/script><\/span>')
);

This is probably not a perfect solution, but should be good enough for simple html pages:

$('<div>').html($('body').html()).find('script').remove().end().text()

Explanation: it creates a div element, copies the html content of the body into it, removes all script tags from the div, and finally gets the text content.

First of all, you can get all the 'none script' elements with the following code:

var elements = $('#body').children().not('script');

Now you could just do the following to get all the text:

var text = elements.text();

However, this will result in no spaces between text nodes, i.e. TESTtest. If this is what you want then great, stop here.

But if you want the spaces, you can loop the elements and build a string:

var text = "";
elements.each(function(){
    text += $(this).text() + " ";
});
text = text.trim();

Note that this solution does not maintain any line breaks, which is what I have assumed based on your question.

发布评论

评论列表(0)

  1. 暂无评论