I would like some help regarding a regular expression in Javascript.
I am trying to match any string that contains either only Basic Latin (ASCII) characters or only Greek Unicode characters. Not allowing strings with mixed characters from these two sets.
I have this regular expression which matches the exact opposite (all strings that contain at least one greek and one latin character), but cannot find a way to negate this:
Thanks in advance.
I would like some help regarding a regular expression in Javascript.
I am trying to match any string that contains either only Basic Latin (ASCII) characters or only Greek Unicode characters. Not allowing strings with mixed characters from these two sets.
I have this regular expression which matches the exact opposite (all strings that contain at least one greek and one latin character), but cannot find a way to negate this:
https://regex101./r/JHzmhc/1
Thanks in advance.
Share Improve this question edited Jun 22, 2017 at 14:11 ktsangop asked Jun 22, 2017 at 13:55 ktsangopktsangop 1,1832 gold badges18 silver badges31 bronze badges 5- Do you mean any ASCII and Greek? – Wiktor Stribiżew Commented Jun 22, 2017 at 13:57
- Yes, Basic Latin == ASCII, right? – ktsangop Commented Jun 22, 2017 at 13:58
-
When you say "Latin", it sounds as if you want to match (or not match) letters. When you use
\x00-\x7F
, you match the whole ASCII table chars, thus, it is more appropriate to name those chars ASCII chars. – Wiktor Stribiżew Commented Jun 22, 2017 at 14:11 - Edited the question based on your suggestions, thank you. – ktsangop Commented Jun 22, 2017 at 14:13
- @WiktorStribiżew "The C0 Controls and Basic Latin block" would be the most formal and precise name, though Unicode documentation is littered with references to ASCII and 0-9 are called the ASCII Digits. – Tom Blodget Commented Jun 22, 2017 at 16:53
2 Answers
Reset to default 4You may use
^(?:[\u0000-\u007F]+|[\u0370-\u03FF]+)$
See the regex demo
Details:
^
- start of string(?:
- start of a non-capturing group (so that the anchors could be applied to both the altetnatives):[\u0000-\u007F]+
- 1+ ASCII chars|
- or[\u0370-\u03FF]+
- 1+ Greek chars
)
- end of group$
- end of string.
Wiktor’s solution has the correct general format. Unfortunately, matching Greek symbols isn’t as simple as [\u0370-\u03FF]
— that way you miss out on many Greek symbols.
With Unicode property escapes in regular expressions, you’d do:
/^(?:[\0-\x7F\p]+|\p{Script_Extensions=Greek}+)$/u
Until Unicode property escapes are officially supported in ECMAScript and implemented everywhere, we can transpile this to:
/^(?:[\0-\x7F]+|(?:[\u0342\u0345\u0370-\u0373\u0375-\u0377\u037A-\u037D\u037F\u0384\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03E1\u03F0-\u03FF\u1D26-\u1D2A\u1D5D-\u1D61\u1D66-\u1D6A\u1DBF-\u1DC1\u1F00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FC4\u1FC6-\u1FD3\u1FD6-\u1FDB\u1FDD-\u1FEF\u1FF2-\u1FF4\u1FF6-\u1FFE\u2126\uAB65]|\uD800[\uDD40-\uDD8E\uDDA0]|\uD834[\uDE00-\uDE45])+)$/
Here’s the demo: https://regex101./r/cmNTLA/1