The thing is I need to treat this kind of Chinese input as invalid in client side validation:
Input is invalid when any English character mixed with any Chinese character and spaces has a total length >=10.
Let's say : "你的a你的a你的a你" or "你的 你的 你的 你" (length is 10) is invalid. But "你的a你的a你的a" (length is 9) is OK.
I am using both Javascript to do client side validation and Java to do the server side. So I suppose applying the regular expression on both should be perfect.
Can anyone give some hints how to write the rules in regular expression?
The thing is I need to treat this kind of Chinese input as invalid in client side validation:
Input is invalid when any English character mixed with any Chinese character and spaces has a total length >=10.
Let's say : "你的a你的a你的a你" or "你的 你的 你的 你" (length is 10) is invalid. But "你的a你的a你的a" (length is 9) is OK.
I am using both Javascript to do client side validation and Java to do the server side. So I suppose applying the regular expression on both should be perfect.
Can anyone give some hints how to write the rules in regular expression?
Share Improve this question edited Oct 18, 2016 at 11:00 Mariano 6,5114 gold badges33 silver badges48 bronze badges asked Oct 18, 2016 at 4:00 jm lijm li 3132 gold badges11 silver badges21 bronze badges 2- what's your with space meaning, and what you have try? – LF-DevJourney Commented Oct 18, 2016 at 4:41
- what about the other characters like ascii? – LF-DevJourney Commented Oct 18, 2016 at 4:46
1 Answer
Reset to default 18From What's the complete range for Chinese characters in Unicode?, the CJK unicode ranges are:
Block Range Comment
--------------------------------------- ----------- ----------------------------------------------------
CJK Unified Ideographs 4E00-9FFF Common
CJK Unified Ideographs Extension A 3400-4DBF Rare
CJK Unified Ideographs Extension B 20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C 2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D 2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E 2B820–2CEAF Rare, historic
CJK Compatibility Ideographs F900-FAFF Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants
CJK Symbols and Punctuation 3000-303F
You probably want to allow code points from the Unicode blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A.
This regex will match 0 to 9 spaces, ideographic spaces (U+3000), A-Z letters, or code points in those 2 CJK blocks.
/^[ A-Za-z\u3000-\u303F\u3400-\u4DBF\u4E00-\u9FFF]{0,9}$/
The ideographs are listed in:
- part 1
- part 2
- part 3
- part 4
- Extension A
However, you may as well add more blocks.
Code:
function has10OrLessCJK(text) {
return /^[ A-Za-z\u3000-\u303F\u3400-\u4DBF\u4E00-\u9FFF]{0,9}$/.test(text);
}
function checkValidation(value) {
var valid = document.getElementById("valid");
if (has10OrLessCJK(value)) {
valid.innerText = "Valid";
} else {
valid.innerText = "Invalid";
}
}
<input type="text"
style="width:100%"
oninput="checkValidation(this.value)"
value="你的a你的a你的a">
<div id="valid">
Valid
</div>