I pose a large HTML file out of a huge unformatted text file. Now my fear is that the text file might contain some malicious JavaScript code. To avoid any damage I scan the text and replace any <
or >
with lt
and gt
. That is quite effective, but it's not really good for the performance.
Is there some tag or attribute or whatever that allows me to turn JavaScript off within the HTML file? In the header perhaps?
I pose a large HTML file out of a huge unformatted text file. Now my fear is that the text file might contain some malicious JavaScript code. To avoid any damage I scan the text and replace any <
or >
with lt
and gt
. That is quite effective, but it's not really good for the performance.
Is there some tag or attribute or whatever that allows me to turn JavaScript off within the HTML file? In the header perhaps?
Share Improve this question edited Aug 6, 2024 at 15:45 informatik01 16.4k11 gold badges78 silver badges108 bronze badges asked Oct 28, 2011 at 10:35 Fotis MCFotis MC 3531 gold badge2 silver badges12 bronze badges 2- 2 Where do the HTML e from? And how do you take it? You should tell us more so that we could help because there is probably some better solutions when you input the HTML code – JMax Commented Oct 28, 2011 at 10:46
- I am creating the HTML myself. Actually it's a big table whose columns are filled with the data I extract from a text file. Therefore I do have control over the basic HTML file, just not what is within the columns. – Fotis MC Commented Oct 28, 2011 at 11:46
5 Answers
Reset to default 4Since you've considered replacing all <
and >
by the HTML entities, a good option would consist of sending the Content-Type: text/plain
header.
If you include want to show the contents of the file, replacing every &
by &
and every <
by <
is sufficient to correctly display the contents of the file. Example:
Input: Huge wall of text 1<a2 &>1
Output: Huge wall of text 1<a2 &>1
Unmodified output, displaying in browser: Huge wall of text 11
(<..>
interpreted as HTML)
If you cannot modify code at the back-end (server-side), you need a HTML parser, which sanitised your code. JavaScript is not the only threat, embedded content (<object>
, <iframe>
, ...) can also be very malicious. Have a look at the following answer for a very detailed HTML parser & sanitizer :
Can I load an entire HTML document into a document fragment in Internet Explorer?
When you have a control of backend, you can provide file with header
Content-type: text/plain;
No, you can't disable JavaScript from inside a webpage, rather, you should sanitize any and all input from your users to make sure no malicious scripts go through your script.
Whether it's by remove all script tags or replacing <
and >
, you need to make sure your input is clean.
Do a search for <script
and replace with <!--<script
and search for </script>
and replace with </script>-->
.
This should ment out all scripts in the file.
you need a sandbox or clean html code. look phpids or html purifier.