regex - JavaScript regular expression to catch kanji

I can't get this javascript function to work the way I want...

// matches a String that contains kanji and/or kana character(s)

String.prototype.isKanjiKana = function(){
    return !!this.match(/^[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]+$/);
}

it does return TRUE if the string is made of kanji and/or kana characters, FALSE if alphabet or other chars are present.

I would like it to return if at least 1 kanji and/or kana characters are present instead that if all of them are.

thank you in advance for any help!

I can't get this javascript function to work the way I want...

// matches a String that contains kanji and/or kana character(s)

String.prototype.isKanjiKana = function(){
    return !!this.match(/^[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]+$/);
}

it does return TRUE if the string is made of kanji and/or kana characters, FALSE if alphabet or other chars are present.

I would like it to return if at least 1 kanji and/or kana characters are present instead that if all of them are.

thank you in advance for any help!

Share Improve this question edited Apr 4, 2015 at 19:01 tchrist 80.4k31 gold badges131 silver badges184 bronze badges asked Sep 8, 2011 at 7:57 Mikele 631 silver badge7 bronze badges

Add a comment |

6 Answers 6

Sorted by: Reset to default 9

The right answer is not to hardcode ranges. Never ever put magic numbers in your code! That is a maintenance nightmare. It is hard to read, hard to write, hard to debug, hard to maintain. How do you know you got the numbers right? What happens when they add new ones? No, do not use magic numbers. Please.

The right answer is to use named Unicode scripts, which are a fundemental aspect of every Unicode code point:

[\p{Han}\p{Hiragana}\p{Katakana}]

That requires the XRegExp plugin for Javascript.

The real problem is that Javascript regexes on their own are too primitive to support Unicode properties — and therefore, to support Unicode. Maybe that was once an acceptable compromise 15 years ago, but today it is nothing less than intolerably negligent, as you yourself have discovered.

You will also miss a few Common code points specified as kana in the new Script Extensions property, but probably no matter. You could just add \p{Common} to the set above.

Now that Unicode property escapes are part of the ES (2018) spec, the following regex can be used natively if the JS engine supports this feature (expanding on @tchrist's answer):

/[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u

If you want to exclude punctuation from being matched:

/(?!\p{Punctuation})[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u

/[\u3000-\u303f]|[\u3040-\u309f]|[\u30a0-\u30ff]|[\uff00-\uffef]|[\u4e00-\u9faf]|[\u3400-\u4dbf]/

Japanese style punctuation: [\u3000-\u303f]
Hiragana: [\u3040-\u309f]
Katakana: [\u30a0-\u30ff]
Roman characters + half-width katakana: [\uff00-\uffef]
Kanji: [\u4e00-\u9faf]|[\u3400-\u4dbf]

String.prototype.isKanjiKana = function(){
    return !!this.match(/[\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF]/);
}

Don't anchor it to beginning and end of string with $^ and the + is useless in this case.

Why not just this? It will return true when it contains at least one Kanji.

/[一-龯]/.test(str)

/[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]/

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

regex - JavaScript regular expression to catch kanji - Stack Overflow

6 Answers 6

与本文相关的文章

评论列表(0)