Not sure if there is a solution available but not able to find. So asking it again.
I am writing an email validator. This should validate all well formed email(This is just one level of validation to check the email is well formed). Now as my code is an international code so i should support non latin characters also. How do i write the most efficient Regex for that?
International Emails:
Some example emails:
- 伊昭傑@郵件.商務
- राम@मोहन.ईन्फो
- юзер@екзампл.ком
- θσερ@εχαμπλε.ψομ
It should be able to validate all the above formats
Not sure if there is a solution available but not able to find. So asking it again.
I am writing an email validator. This should validate all well formed email(This is just one level of validation to check the email is well formed). Now as my code is an international code so i should support non latin characters also. How do i write the most efficient Regex for that?
International Emails: http://en.wikipedia.org/wiki/International_email
Some example emails:
- 伊昭傑@郵件.商務
- राम@मोहन.ईन्फो
- юзер@екзампл.ком
- θσερ@εχαμπλε.ψομ
It should be able to validate all the above formats
Share Improve this question edited May 9, 2013 at 7:27 KD. asked May 9, 2013 at 5:23 KD.KD. 2,0553 gold badges29 silver badges59 bronze badges 7 | Show 2 more comments4 Answers
Reset to default 9The reason why email validation through regexp is so lax is because it's not efficient. There is a spec for email address syntax, but the regexp to check it is so long, it's impractical. Besides, email providers are more strict in their implementation of the syntax than the actual spec. An email might be considered valid as the spec says it is, but invalid according to the provider.
That's also the reason why activation emails exist, because the only way to check if an email is valid, existing and currently in use is to send it something, usually a unique activation code or link. Only when that unique activation code or link, only sent to that email, is used will the email be considered valid.
Until then, consider a more lax approach in validating emails, checking if the username, the @
and the domain parts. Besides, why would one sign up using a false email anyway? If they did, they wouldn't get the activation link, and can't proceed creating the account.
@Patashu Thanks a lot. I've improved your regex a bit and now it's absolutely suits my needs:
^([^@\s\."'\(\)\[\]\{\}\\/,:;]+\.)*[^@\s\."'\(\)\[\]\{\}\\/,:;]+@[^@\s\."'\(\)\[\]\{\}\\/,:;]+(\.[^@\s\."'\(\)\[\]\{\}\\/,:;]+)+$
In case of Java, this one works quite well for me.
"^[\\p{L}\\p{N}\\._%+-]+@[\\p{L}\\p{N}\\.\\-]+\\.[\\p{L}]{2,}$"
It does not allow IP's after the @ but most valid email in the from of [email protected]
could be validated with it.
\p{L}
validates UTF-Letters and \p{N}
validates UTF-Numbers. You can check this doc for more information.
This works for me using Python if you do not have special letters,characters but have dashes(-), numbers, both in the domain and in the username. This also matches if you have country extensions.
[a-zA-Z0-9.-]+@[a-zA-Z-]+.(com|edu|net)(.([a-z]+))*
[^\s@]+@[^\s@]+\.[^\s@]+
. (Confirmed in www.regexpal.com to match all four of those emails exactly.) Don't get any more complex than that - the only way to find out an email address is REALLY correct is to send it an email, anyway, so be permissive. – Patashu Commented May 9, 2013 at 5:25[email protected]
,[email protected]
,x@-y
,x@y-
,x@a--b
, etc... – Déjà vu Commented May 9, 2013 at 5:27