I want to ensure that the user input doesn't contain characters like <
, >
or &#
, whether it is text input or textarea. My pattern:
var pattern = /^((?!&#|<|>).)*$/m;
The problem is, that it still matches multiline strings from a textarea like
this text matches
though this should not, because of this character <
EDIT:
To be more clear, I need exclude &#
bination only, not &
or #
.
Please suggest the solution. Very grateful.
I want to ensure that the user input doesn't contain characters like <
, >
or &#
, whether it is text input or textarea. My pattern:
var pattern = /^((?!&#|<|>).)*$/m;
The problem is, that it still matches multiline strings from a textarea like
this text matches
though this should not, because of this character <
EDIT:
To be more clear, I need exclude &#
bination only, not &
or #
.
Please suggest the solution. Very grateful.
Share Improve this question edited Jun 20, 2020 at 9:12 CommunityBot 11 silver badge asked Jul 29, 2013 at 15:34 ZabavskyZabavsky 13.6k8 gold badges57 silver badges80 bronze badges 2-
It appears you are looking to exclude all special HTML chars including entities such as:
 
,
, etc. (ergo the exclusion&#
). If so, then you probably want to also exclude the other syntax of HTML entities: e.g.&
,<
, etc. (Which do NOT have the#
hash following the&
) Yes? – ridgerunner Commented Jul 29, 2013 at 16:22 - @ridgerunner, no, just those three. Thanks. – Zabavsky Commented Jul 29, 2013 at 16:56
3 Answers
Reset to default 2You're probably not looking for m
(multiline) switch but s
(DOTALL) switch in Javascript. Unfortunately s
doesn't exist in Javascript.
However good news that DOTALL can be simulated using [\s\S]
. Try following regex:
/^(?![\s\S]*?(&#|<|>))[\s\S]*$/
OR:
/^((?!&#|<|>)[\s\S])*$/
Live Demo
I don't think you need a lookaround assertion in this case. Simply use a negated character class:
var pattern = /^[^<>&#]*$/m;
If you're also disallowing the following characters, -
, [
, ]
, make sure to escape them or put them in proper order:
var pattern = /^[^][<>&#-]*$/m;
Alternate answer to specific question:
anubhava's solution works accurately, but is slow because it must perform a negative lookahead at each and every character position in the string. A simpler approach is to use reverse logic. i.e. Instead of verifying that: /^((?!&#|<|>)[\s\S])*$/
does match, verify that /[<>]|&#/
does NOT match. To illustrate this, lets create a function: hasSpecial()
which tests if a string has one of the special chars. Here are two versions, the first uses anubhava's second regex:
function hasSpecial_1(text) {
// If regex matches, then string does NOT contain special chars.
return /^((?!&#|<|>)[\s\S])*$/.test(text) ? false : true;
}
function hasSpecial_2(text) {
// If regex matches, then string contains (at least) one special char.
return /[<>]|&#/.test(text) ? true : false;
}
These two functions are functionally equivalent, but the second one is probably quite a bit faster.
Note that when I originally read this question, I misinterpreted it to really want to exclude HTML special chars (including HTML entities). If that were the case, then the following solution will do just that.
Test if a string contains HTML special Chars:
It appears that the OP want to ensure a string does not contain any special HTML characters including: <
, >
, as well as decimal and hex HTML entities such as:  
,  
, etc. If this is the case then the solution should probably also exclude the other (named) type of HTML entities such as: &
, <
, etc. The solution below excludes all three forms of HTML entities as well as the <>
tag delimiters.
Here are two approaches: (Note that both approaches do allow the sequence: &#
if it is not part of a valid HTML entity.)
FALSE test using positive regex:
function hasHtmlSpecial_1(text) {
/* Commented regex:
# Match string having no special HTML chars.
^ # Anchor to start of string.
[^<>&]* # Zero or more non-[<>&] (normal*).
(?: # Unroll the loop. ((special normal*)*)
& # Allow a & but only if
(?! # not an HTML entity (3 valid types).
(?: # One from 3 types of HTML entities.
[a-z\d]+ # either a named entity,
| \#\d+ # or a decimal entity,
| \#x[a-f\d]+ # or a hex entity.
) # End group of HTML entity types.
; # All entities end with ";".
) # End negative lookahead.
[^<>&]* # More (normal*).
)* # End unroll the loop.
$ # Anchor to end of string.
*/
var re = /^[^<>&]*(?:&(?!(?:[a-z\d]+|#\d+|#x[a-f\d]+);)[^<>&]*)*$/i;
// If regex matches, then string does NOT contain HTML special chars.
return re.test(text) ? false : true;
}
Note that the above regex utilizes Jeffrey Friedl's "Unrolling-the-Loop" efficiency technique and will run very quickly for both matching and non-matching cases. (See his regex masterpiece: Mastering Regular Expressions (3rd Edition))
TRUE test using negative regex:
function hasHtmlSpecial_2(text) {
/* Commented regex:
# Match string having one special HTML char.
[<>] # Either a tag delimiter
| & # or a & if start of
(?: # one of 3 types of HTML entities.
[a-z\d]+ # either a named entity,
| \#\d+ # or a decimal entity,
| \#x[a-f\d]+ # or a hex entity.
) # End group of HTML entity types.
; # All entities end with ";".
*/
var re = /[<>]|&(?:[a-z\d]+|#\d+|#x[a-f\d]+);/i;
// If regex matches, then string contains (at least) one special HTML char.
return re.test(text) ? true : false;
}
Note also that I have included a mented version of each of these (non-trivial) regexes in the form of a JavaScript ment.