最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Regex that allows all international characters but no symbols - Stack Overflow

programmeradmin0浏览0评论

I already wrote the following Regex that allows all international characters (Latin, Asian, ...)

 'Düsseldorf, Köln, Москва, 北京市, إسرائيل !@#$'.match(/[\p{L}-]+/ug)

But I would like to make it not allowing all special characters like !?})%....

I already wrote the following Regex that allows all international characters (Latin, Asian, ...)

 'Düsseldorf, Köln, Москва, 北京市, إسرائيل !@#$'.match(/[\p{L}-]+/ug)

But I would like to make it not allowing all special characters like !?})%....

Share Improve this question asked Jun 24, 2021 at 19:57 ManuManu 1,0851 gold badge15 silver badges24 bronze badges 11
  • Do you mean you only allow letters and hyphens in the string? /^[\p{L}-]+$/u? – Wiktor Stribiżew Commented Jun 24, 2021 at 20:00
  • I would suggest that you use character ranges (ex [a-z][A-Z][...]) – ControlAltDel Commented Jun 24, 2021 at 20:01
  • the other thing you could do is make a negative pattern just with the characters you don't want and negate that – ControlAltDel Commented Jun 24, 2021 at 20:02
  • @WiktorStribiżew , Yes the regex also allows hyphens - – Manu Commented Jun 24, 2021 at 20:05
  • Does /^[\p{L}-]+$/u answer the question? – Wiktor Stribiżew Commented Jun 24, 2021 at 20:06
 |  Show 6 more ments

3 Answers 3

Reset to default 9

Matching string containing only letters, numbers, dashes, dots, mas and whitespace:

console.log(
  /^[\p{L},.0-9\s-]+$/u.test('Düsseldorf, Köln, Москва, 北京市, إسرائيل !@#$')
)
console.log(
  /^[\p{L},.0-9\s-]+$/u.test('Düsseldorf, Köln, Москва, 北京市, إسرائيل')
)

Results: false and true.

EXPLANATION

-------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  [\p{L},.0-9\s-]+         any character of: letter, ',', '.',
                           '0' to '9', whitespace (\n, \r, \t, \f,
                           and " "), '-' (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Sadly, javascript regular expressions (pared to other programming languages) still have a poor support for UTF-8/UTF-16 characters, even if it is a planned feature.

Currently, there is no other option (I know) than to add ranges, which should look like:

new RegExp(/^[ \-.a-zšđčćžÀ-ÖØ-öø-ÿ]+$/i).test('St. Petersburg')

From your examples, it looks like you are looking for full UTF-16 support, so you will have to add some ranges yourself. You can use https://www.fileformat.info/info/charset/UTF-16/list.htm as a reference. It includes a description to identify which chars are letters and which not.

There's a book called "Javascript, The Good Parts" that provides some good examples on this, in short you can do something like:

/^[a-zA-Z0-9 \u00C0-\u1FFF\u2800-\uFFFD]+$/
发布评论

评论列表(0)

  1. 暂无评论