I want to get the TEXT ONLY from the following HTML document without the contents of the <script>
tag?
<html>
<body>
<script>
a = 0;
</script>
<div>TEST</div>
<p>test</p>
</body>
</html>
I have the following code:
$('body').text()
This currently gets the result:
a = 0; TEST test
But I am trying to get the result:
TEST test
I want to get the TEXT ONLY from the following HTML document without the contents of the <script>
tag?
<html>
<body>
<script>
a = 0;
</script>
<div>TEST</div>
<p>test</p>
</body>
</html>
I have the following code:
$('body').text()
This currently gets the result:
a = 0; TEST test
But I am trying to get the result:
TEST test
Share
Improve this question
edited Sep 28, 2017 at 17:38
Mr. Alien
158k36 gold badges303 silver badges285 bronze badges
asked Sep 28, 2017 at 13:40
Caleb ParkCaleb Park
312 bronze badges
3
- 2 I have no idea what you are trying to explain here – Mr. Alien Commented Sep 28, 2017 at 13:42
- I edited quite a lot but I think it clears up your question, feel free to edit it if I got anything wrong – musefan Commented Sep 28, 2017 at 13:44
- You could remove all the scripts before hand... they are all loaded into memory already. The only potential problem is if any code uses some for templates or other similar use – Patrick Evans Commented Sep 28, 2017 at 13:45
4 Answers
Reset to default 3Ok, so as you edited your question. If you are looking to extract the text from the page but not script
tags, you can write something like
let cloneBody = $('body').clone().find('script').remove().end();
console.log(cloneBody.text().trim());
<script src="https://ajax.googleapis./ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script>
var a = 1;
</script>
<p>Hello World</p>
<div>This is a test run</div>
You can do this using javascript as shown in a previous answer: Removing all script tags from html with JS Regular Expression
function stripScripts(s) {
var div = document.createElement('div');
div.innerHTML = s;
var scripts = div.getElementsByTagName('script');
var i = scripts.length;
while (i--) {
scripts[i].parentNode.removeChild(scripts[i]);
}
return div.innerHTML;
}
alert(
stripScripts('<span><script type="text/javascript">alert(\'foo\');<\/script><\/span>')
);
This is probably not a perfect solution, but should be good enough for simple html pages:
$('<div>').html($('body').html()).find('script').remove().end().text()
Explanation: it creates a div element, copies the html content of the body into it, removes all script tags from the div, and finally gets the text content.
First of all, you can get all the 'none script' elements with the following code:
var elements = $('#body').children().not('script');
Now you could just do the following to get all the text:
var text = elements.text();
However, this will result in no spaces between text nodes, i.e. TESTtest
. If this is what you want then great, stop here.
But if you want the spaces, you can loop the elements and build a string:
var text = "";
elements.each(function(){
text += $(this).text() + " ";
});
text = text.trim();
Note that this solution does not maintain any line breaks, which is what I have assumed based on your question.