I have an input field in a form. Upon pushing submit, I want to validate to make sure the user entered non-latin characters only, so any foreign language characters, like Chinese among many others. Or at the very least test to make sure it does not contain any latin characters.
Could I use a regular expression for this? What would be the best approach for this?
I am validating in both javaScript and in PHP. What solutions can I use to check for foreign characters in the input field in both programming languages?
I have an input field in a form. Upon pushing submit, I want to validate to make sure the user entered non-latin characters only, so any foreign language characters, like Chinese among many others. Or at the very least test to make sure it does not contain any latin characters.
Could I use a regular expression for this? What would be the best approach for this?
I am validating in both javaScript and in PHP. What solutions can I use to check for foreign characters in the input field in both programming languages?
Share Improve this question asked Mar 23, 2010 at 11:34 zeckdudezeckdude 16.2k44 gold badges148 silver badges194 bronze badges 6-
1
What exactly do you mean by "non-latin"? Any characters not in the
latin-1
set? Or anything except unaccented A-Z/a-z characters? Should punctuation characters be allowed, or only actual letters? Does it have to work in both PHP and JavaScript, or is a solution for one of them sufficient? – Tim Pietzcker Commented Mar 23, 2010 at 11:51 - By Non-Latin, I mean any foreign languages characters that use pletely different characters, like Asian characters. The field is for a person's professional Title, so I would imagine that punctuation might be needed for cases like John T. Smith, Ph.D. Thanks for asking. I hadn't even considered that! – zeckdude Commented Mar 23, 2010 at 12:00
- I don't necessarily need a solution that works for both. I will take two different solutions for both PHP and JavaScript as long as they both do what I need. Can't be picky. – zeckdude Commented Mar 23, 2010 at 12:03
-
I don't get this at all... You said that you DON'T want any Latin characters. Is that correct? I'd add
^[^a-zA-z]+$
or^[^a-zA-z,.]+$
as an answer, but I don't understand which one you want... Can you add examples of valid and invalid inputs? – Kobi Commented Mar 23, 2010 at 15:12 - 1 "汉字 漢字" is valid and "John Smith" is invalid – zeckdude Commented Mar 23, 2010 at 23:55
3 Answers
Reset to default 3In PHP, you can check the Unicode property IsLatin
. That's probably closest to what you want.
So if preg_match('/\p{Latin}/u', $subject)
returns true, then there is at least one Latin character in your $subject
. See also this reference.
JavaScript doesn't support this; you'd have to contruct the valid Unicode ranges manually.
In Javascript, at least, you can use hex codes inside character range expressions:
var rlatins = /[\u0000-\u007f]/;
You can then test to see if there are any latin characters in a string like this:
if (rlatins.test(someString)) {
alert("ROMANI ITE DOMUM");
}
You're trying to check if all letters are not Latin, but you do accept accented letters.
A simple solution is to validate the string using the regex (this is useful if you have a validation plugin):
/^[^a-z]+$/i
^...$
- Match from start to end^[...]
- characters that are nota-z
- A though Z,+
- with at least one letter/i
- ignoring case (could also done/^[^a-zA-Z]+$/
)
Another option is simply to look for a letter:
/[a-z]/i
This regex will match if the string conatins a letter, so you can unvalidated it.
In JavaScript you can check that easily with if
:
var s = "שלום עולם";
if(s.match(/^[^a-z]+$/i){
}
or
if(!s.match(/[a-z]/i))
PHP has a different syntax and more security than JavaScript, but the regular expressions are the same.