I already wrote the following Regex that allows all international characters (Latin, Asian, ...)
'Düsseldorf, Köln, Москва, 北京市, إسرائيل !@#$'.match(/[\p{L}-]+/ug)
But I would like to make it not allowing all special characters like !?})%....
I already wrote the following Regex that allows all international characters (Latin, Asian, ...)
'Düsseldorf, Köln, Москва, 北京市, إسرائيل !@#$'.match(/[\p{L}-]+/ug)
But I would like to make it not allowing all special characters like !?})%....
-
Do you mean you only allow letters and hyphens in the string?
/^[\p{L}-]+$/u
? – Wiktor Stribiżew Commented Jun 24, 2021 at 20:00 - I would suggest that you use character ranges (ex [a-z][A-Z][...]) – ControlAltDel Commented Jun 24, 2021 at 20:01
- the other thing you could do is make a negative pattern just with the characters you don't want and negate that – ControlAltDel Commented Jun 24, 2021 at 20:02
-
@WiktorStribiżew , Yes the regex also allows hyphens
-
– Manu Commented Jun 24, 2021 at 20:05 -
Does
/^[\p{L}-]+$/u
answer the question? – Wiktor Stribiżew Commented Jun 24, 2021 at 20:06
3 Answers
Reset to default 9Matching string containing only letters, numbers, dashes, dots, mas and whitespace:
console.log(
/^[\p{L},.0-9\s-]+$/u.test('Düsseldorf, Köln, Москва, 北京市, إسرائيل !@#$')
)
console.log(
/^[\p{L},.0-9\s-]+$/u.test('Düsseldorf, Köln, Москва, 北京市, إسرائيل')
)
Results: false
and true
.
EXPLANATION
-------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[\p{L},.0-9\s-]+ any character of: letter, ',', '.',
'0' to '9', whitespace (\n, \r, \t, \f,
and " "), '-' (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Sadly, javascript regular expressions (pared to other programming languages) still have a poor support for UTF-8/UTF-16 characters, even if it is a planned feature.
Currently, there is no other option (I know) than to add ranges, which should look like:
new RegExp(/^[ \-.a-zšđčćžÀ-ÖØ-öø-ÿ]+$/i).test('St. Petersburg')
From your examples, it looks like you are looking for full UTF-16 support, so you will have to add some ranges yourself. You can use https://www.fileformat.info/info/charset/UTF-16/list.htm as a reference. It includes a description to identify which chars are letters and which not.
There's a book called "Javascript, The Good Parts" that provides some good examples on this, in short you can do something like:
/^[a-zA-Z0-9 \u00C0-\u1FFF\u2800-\uFFFD]+$/