Which regular expression can I use to match (allow) any kind of letter from any language?
I need to match any letter including any diacritics (e.g., á, ü, ñ) and exclude any kind of symbol (math symbols, currency signs, dingbats, box-drawing characters, etc.) and punctuation characters.
I'm using ASP.NET MVC 2 with .NET 4. I’ve tried this annotation in my view model
[RegularExpression(@"\p{L}*", ...
and this one
[RegularExpression(@"\p{L}\p{M}*", ...
but client-side validation rejects accented characters.
UPDATE: Thank you for all your answers. Your suggestions work but only for .NET, and the problem here is that it also uses the regex for client-side validation with JavaScript.
I had to go with
[^0-9_\|°¬!#\$%/\\\(\)\?¡¿\+\{\}\[\]:\.\,;@ª^\*<>=&]
which is very ugly and does not cover all scenarios but is the closest thing to what I need.
Which regular expression can I use to match (allow) any kind of letter from any language?
I need to match any letter including any diacritics (e.g., á, ü, ñ) and exclude any kind of symbol (math symbols, currency signs, dingbats, box-drawing characters, etc.) and punctuation characters.
I'm using ASP.NET MVC 2 with .NET 4. I’ve tried this annotation in my view model
[RegularExpression(@"\p{L}*", ...
and this one
[RegularExpression(@"\p{L}\p{M}*", ...
but client-side validation rejects accented characters.
UPDATE: Thank you for all your answers. Your suggestions work but only for .NET, and the problem here is that it also uses the regex for client-side validation with JavaScript.
I had to go with
[^0-9_\|°¬!#\$%/\\\(\)\?¡¿\+\{\}\[\]:\.\,;@ª^\*<>=&]
which is very ugly and does not cover all scenarios but is the closest thing to what I need.
Share Improve this question edited Apr 2, 2020 at 14:29 Greg Bacon 139k34 gold badges194 silver badges250 bronze badges asked Jun 1, 2010 at 12:52 pedropedro 4113 silver badges9 bronze badges 06 Answers
Reset to default 5You can use Char.IsLetter
:
Indicates whether the specified Unicode character is categorized as a Unicode letter.
With .Net 4.0:
string onlyLetters = String.Concat(str.Where(Char.IsLetter));
On 3.5 String.Concat
only excepts an array, so you should also call ToArray
.
Your problem is more likely to the fact that you will only have to have one alpha-char, because the regex will match anything that has at least one char.
By adding ^
as prefix and $
as postfix, the whole sentence should comply to your regex. So this prob works:
^\p{L}*$
Regexbuddy explains:
^
Assert position at beginning of the string\p{L}
A character with the Unicode property 'letter' (any kind of letter from any kind of language) 2a. Between zero and unlimited times, as many as possible (greedy)$
Assert position at the end of the string
\p{L}*
should match "any kind of letter from any language". It should work, I used it in a i18n-proof uppercase/lowercase recognition regex in .NET.
I’ve just had to validate a URL and I chose this regular expression in .NET.
^[(\p{L})?(\p{M})?-]*$
Begin and end with a character of any language (optionally either letters or marks) and allow hyphens.
One thing to watch out for is the client-side regex. It uses javascript regex on the client side and .net regex on the server side. Javascript won't support this scenario.
\w - matches any alphanumeric character (including numbers)
In my tests it has matched:
- ã
- à
- ç
- 8
- z
and hasn't matched:
- ;
- ,
- \
- :
In case you know exactly what you want to exclude (like a little list) you cand do the following:
[^;,\`.]
which matches one time any character that isnt:
- ;
- ,
- \
- `
- .
Hope it helps!