javascript - How to use regular expression to validate Chinese input?

The thing is I need to treat this kind of Chinese input as invalid in client side validation:

Input is invalid when any English character mixed with any Chinese character and spaces has a total length >=10.

Let's say : "你的a你的a你的a你" or "你的你的你的你" (length is 10) is invalid. But "你的a你的a你的a" (length is 9) is OK.

I am using both Javascript to do client side validation and Java to do the server side. So I suppose applying the regular expression on both should be perfect.

Can anyone give some hints how to write the rules in regular expression?

The thing is I need to treat this kind of Chinese input as invalid in client side validation:

Input is invalid when any English character mixed with any Chinese character and spaces has a total length >=10.

Let's say : "你的a你的a你的a你" or "你的你的你的你" (length is 10) is invalid. But "你的a你的a你的a" (length is 9) is OK.

I am using both Javascript to do client side validation and Java to do the server side. So I suppose applying the regular expression on both should be perfect.

Can anyone give some hints how to write the rules in regular expression?

Share Improve this question edited Oct 18, 2016 at 11:00 Mariano 6,5114 gold badges33 silver badges48 bronze badges asked Oct 18, 2016 at 4:00 jm li 3132 gold badges11 silver badges21 bronze badges

what's your with space meaning, and what you have try? – LF-DevJourney Commented Oct 18, 2016 at 4:41
what about the other characters like ascii? – LF-DevJourney Commented Oct 18, 2016 at 4:46

Add a comment |

1 Answer 1

Sorted by: Reset to default 18

From What's the complete range for Chinese characters in Unicode?, the CJK unicode ranges are:

Block                                   Range       Comment
--------------------------------------- ----------- ----------------------------------------------------
CJK Unified Ideographs                  4E00-9FFF   Common
CJK Unified Ideographs Extension A      3400-4DBF   Rare
CJK Unified Ideographs Extension B      20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic
CJK Compatibility Ideographs            F900-FAFF   Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants
CJK Symbols and Punctuation             3000-303F

You probably want to allow code points from the Unicode blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A.

This regex will match 0 to 9 spaces, ideographic spaces (U+3000), A-Z letters, or code points in those 2 CJK blocks.

/^[ A-Za-z\u3000-\u303F\u3400-\u4DBF\u4E00-\u9FFF]{0,9}$/

The ideographs are listed in:

part 1
part 2
part 3
part 4
Extension A

However, you may as well add more blocks.

Code:

function has10OrLessCJK(text) {
    return /^[ A-Za-z\u3000-\u303F\u3400-\u4DBF\u4E00-\u9FFF]{0,9}$/.test(text);
}

function checkValidation(value) {
    var valid = document.getElementById("valid");
    if (has10OrLessCJK(value)) {
        valid.innerText = "Valid";
    } else {
        valid.innerText = "Invalid";
    }
}

<input type="text" 
       style="width:100%"
       oninput="checkValidation(this.value)"
       value="你的a你的a你的a">

<div id="valid">
    Valid
</div>

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - How to use regular expression to validate Chinese input? - Stack Overflow

1 Answer 1

Code:

与本文相关的文章

评论列表(0)