javascript - Get text from DOM excluding script tags

I want to get the TEXT ONLY from the following HTML document without the contents of the <script> tag?

<html>
  <body>
    <script>
      a = 0;
    </script>
   <div>TEST</div>
   <p>test</p>
  </body>
</html>

I have the following code:

$('body').text()

This currently gets the result:

a = 0; TEST test

But I am trying to get the result:

TEST test

I want to get the TEXT ONLY from the following HTML document without the contents of the <script> tag?

<html>
  <body>
    <script>
      a = 0;
    </script>
   <div>TEST</div>
   <p>test</p>
  </body>
</html>

I have the following code:

$('body').text()

This currently gets the result:

a = 0; TEST test

But I am trying to get the result:

TEST test

Share Improve this question edited Sep 28, 2017 at 17:38 Mr. Alien 158k36 gold badges303 silver badges285 bronze badges asked Sep 28, 2017 at 13:40 Caleb Park 312 bronze badges

2 I have no idea what you are trying to explain here – Mr. Alien Commented Sep 28, 2017 at 13:42
I edited quite a lot but I think it clears up your question, feel free to edit it if I got anything wrong – musefan Commented Sep 28, 2017 at 13:44
You could remove all the scripts before hand... they are all loaded into memory already. The only potential problem is if any code uses some for templates or other similar use – Patrick Evans Commented Sep 28, 2017 at 13:45

Add a ment |

4 Answers 4

Sorted by: Reset to default 3

Ok, so as you edited your question. If you are looking to extract the text from the page but not script tags, you can write something like

let cloneBody = $('body').clone().find('script').remove().end();
                
console.log(cloneBody.text().trim());

<script src="https://ajax.googleapis./ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script>
  var a = 1;
</script>
<p>Hello World</p>
<div>This is a test run</div>

You can do this using javascript as shown in a previous answer: Removing all script tags from html with JS Regular Expression

function stripScripts(s) {
    var div = document.createElement('div');
    div.innerHTML = s;
    var scripts = div.getElementsByTagName('script');
    var i = scripts.length;
    while (i--) {
      scripts[i].parentNode.removeChild(scripts[i]);
    }
    return div.innerHTML;
  }

alert(
 stripScripts('<span><script type="text/javascript">alert(\'foo\');<\/script><\/span>')
);

This is probably not a perfect solution, but should be good enough for simple html pages:

$('<div>').html($('body').html()).find('script').remove().end().text()

Explanation: it creates a div element, copies the html content of the body into it, removes all script tags from the div, and finally gets the text content.

First of all, you can get all the 'none script' elements with the following code:

var elements = $('#body').children().not('script');

Now you could just do the following to get all the text:

var text = elements.text();

However, this will result in no spaces between text nodes, i.e. TESTtest. If this is what you want then great, stop here.

But if you want the spaces, you can loop the elements and build a string:

var text = "";
elements.each(function(){
    text += $(this).text() + " ";
});
text = text.trim();

Note that this solution does not maintain any line breaks, which is what I have assumed based on your question.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Get text from DOM excluding script tags - Stack Overflow

4 Answers 4

与本文相关的文章

评论列表(0)